methodshub-grafzahl

Property	Value
?:about	<https://data.gesis.org/gesiskg/resource/Deep_Learning> <https://data.gesis.org/gesiskg/resource/Supervised_machine_learning> <https://data.gesis.org/gesiskg/resource/Text_analysis>
?:abstract	Fine-tune Transformer models within R (xsd:string)
?:alternateName	fine-tuning Transformers for text data from within R (xsd:string)
?:associatedTask	<https://data.gesis.org/gesiskg/resource/methods_hub_task_data_analysis>
?:codeRepository	<https://github.com/gesistsa/grafzahl>
?:contributor	<https://data.gesis.org/gesiskg/resource/methodshub-grafzahl_Chan_Chung-hong>
?:dateModified	2025 (xsd:gyear)
?:datePublished	2025 (xsd:gyear)
?:description	Duct tape the ‘quanteda’ ecosystem (Benoit et al., 2018) doi:10.21105/joss.00774 to modern Transformer-based text classification models (Wolf et al., 2020) doi:10.18653/v1/2020.emnlp-demos.6 , in order to facilitate supervised machine learning for textual data. This package mimics the behaviors of ‘quanteda.textmodels’ and provides a function to setup the ‘Python’ environment to use the pretrained models from ‘Hugging Face’ https://huggingface.co/ . More information: doi:10.5117/CCR2023.1.003.CHAN . Keywords Deep Learning Supervised machine learning Text analysis Use Cases This package can be used in any typical supervised machine learning usecase involving text data. In the software paper ( Chan et al. ), several cases were presented, e.g. Prediction of incivility based on tweets ( Theocharis et al., 2020 ). Input Data grafzahl accepts text data as either character vector or the corpus data structure of quanteda . Sample Input and Output Data A sample input is a corpus . This is an example dataset: library(grafzahl) library(quanteda) unciviltweets Corpus consisting of 19,982 documents and 1 docvar. text1 : "@ @ Karma gave you a second chance yesterday. Start doing m..." text2 : "@ With people like you, Steve King there's still hope for we..." text3 : "@ @ You bill is a joke and will sink the GOP. #WEDESERVEBETT..." text4 : "@ Dream on. The only thing trump understands is how to enric..." text5 : "@ @ Just like the Democrat taliban party was up front with t..." text6 : "@ you are going to have more of the same with HRC, and you a..." [ reached max_ndoc ... 19,976 more documents ] The output is an S3 object. Hardware Requirements Grafzahl runs on any machine that can run R. A GPU that supports CUDA is optional. Environment Setup With R installed: install.packages("grafzahl") How to Use Before training, please setup the conda environment. setup_grafzahl(cuda = TRUE) ## if you have GPU(s) A typical way to train and make predictions. input <- corpus(ecosent, text_field = "headline") training_corpus <- corpus_subset(input, !gold) Use the x (text data), y (label, in this case a docvar ), and model_name (Model name, from Hugging Face) parameters to control how the supervised machine learning model is trained. model2 <- grafzahl(x = training_corpus, y = "value", model_name = "GroNLP/bert-base-dutch-cased") test_corpus <- corpus_subset(input, gold) predict(model2, test_corpus) Technical Details See the publication for tested and selected models and parameters, the reasoning behind the model selection, and employed datasets for training. References Chan, C. H. (2023). grafzahl: fine-tuning Transformers for text data from within R. Computational Communication Research, 5(1), 76. https://doi.org/10.5117/CCR2023.1.003.CHAN Contact Details Maintainer: Chung-hong Chan chainsawtiney@gmail.com Issue Tracker: https://github.com/gesistsa/grafzahl/issues (xsd:string)
?:doi	10.71627/grafzahl.1 ()
?:downloadURL	<https://github.com/gesistsa/grafzahl/>
?:format	SCRIPTS (de) SCRIPTS (en)
?:gitReference	904066193116b1e0646640d432a0c15a10da5c8d ()
is ?:hasPart of	<https://data.gesis.org/gesiskg/resource/>
?:license	GPL-3.0-only (xsd:string)
?:linksDocumentation	<https://gesistsa.github.io/grafzahl/>
?:name	grafzahl (xsd:string)
?:portalUrl	<https://methodshub.gesis.org/library/methods/grafzahl/1/>
?:programmingLanguage	R (en) R (de)
?:relatedPublication	<https://data.gesis.org/gesiskg/resource/methods_hub_pub_grafzahl%3A_fine-tuning_transformers_for_text_data_from_within_r>
?:sourceInfo	GESIS-Methods Hub (xsd:string)
rdf:type	<https://schema.org/SoftwareSourceCode>