PropertyValue
?:about
?:abstract
  • Fine-tune Transformer models within R (xsd:string)
?:alternateName
  • fine-tuning Transformers for text data from within R (xsd:string)
?:associatedTask
?:codeRepository
?:contributor
?:dateModified
  • 2025 (xsd:gyear)
?:datePublished
  • 2025 (xsd:gyear)
?:description
  • Duct tape the ‘quanteda’ ecosystem (Benoit et al., 2018) doi:10.21105/joss.00774 to modern Transformer-based text classification models (Wolf et al., 2020) doi:10.18653/v1/2020.emnlp-demos.6 , in order to facilitate supervised machine learning for textual data. This package mimics the behaviors of ‘quanteda.textmodels’ and provides a function to setup the ‘Python’ environment to use the pretrained models from ‘Hugging Face’ https://huggingface.co/ . More information: doi:10.5117/CCR2023.1.003.CHAN . Keywords Deep Learning Supervised machine learning Text analysis Use Cases This package can be used in any typical supervised machine learning usecase involving text data. In the software paper ( Chan et al. ), several cases were presented, e.g. Prediction of incivility based on tweets ( Theocharis et al., 2020 ). Input Data grafzahl accepts text data as either character vector or the corpus data structure of quanteda . Sample Input and Output Data A sample input is a corpus . This is an example dataset: library(grafzahl) library(quanteda) unciviltweets Corpus consisting of 19,982 documents and 1 docvar. text1 : "@ @ Karma gave you a second chance yesterday. Start doing m..." text2 : "@ With people like you, Steve King there's still hope for we..." text3 : "@ @ You bill is a joke and will sink the GOP. #WEDESERVEBETT..." text4 : "@ Dream on. The only thing trump understands is how to enric..." text5 : "@ @ Just like the Democrat taliban party was up front with t..." text6 : "@ you are going to have more of the same with HRC, and you a..." [ reached max_ndoc ... 19,976 more documents ] The output is an S3 object. Hardware Requirements Grafzahl runs on any machine that can run R. A GPU that supports CUDA is optional. Environment Setup With R installed: install.packages("grafzahl") How to Use Before training, please setup the conda environment. setup_grafzahl(cuda = TRUE) ## if you have GPU(s) A typical way to train and make predictions. input <- corpus(ecosent, text_field = "headline") training_corpus <- corpus_subset(input, !gold) Use the x (text data), y (label, in this case a docvar ), and model_name (Model name, from Hugging Face) parameters to control how the supervised machine learning model is trained. model2 <- grafzahl(x = training_corpus, y = "value", model_name = "GroNLP/bert-base-dutch-cased") test_corpus <- corpus_subset(input, gold) predict(model2, test_corpus) Technical Details See the publication for tested and selected models and parameters, the reasoning behind the model selection, and employed datasets for training. References Chan, C. H. (2023). grafzahl: fine-tuning Transformers for text data from within R. Computational Communication Research, 5(1), 76. https://doi.org/10.5117/CCR2023.1.003.CHAN Contact Details Maintainer: Chung-hong Chan chainsawtiney@gmail.com Issue Tracker: https://github.com/gesistsa/grafzahl/issues (xsd:string)
?:downloadURL
?:format
  • SCRIPTS (de)
  • SCRIPTS (en)
is ?:hasPart of
?:license
  • GPL-3.0-only (xsd:string)
?:linksDocumentation
?:name
  • grafzahl (xsd:string)
?:portalUrl
?:programmingLanguage
  • R (de)
  • R (en)
?:relatedPublication
?:sourceInfo
  • GESIS-Methods Hub (xsd:string)
rdf:type
?:version
  • 904066193116b1e0646640d432a0c15a10da5c8d (xsd:string)