Soft-Label Integration for Robust Toxicity Classification

Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Shuo Han, Xin-Qiang Cai, Xinyu Xing

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that our approach outperforms existing baseline methods in terms of both average and worst-group accuracy, confirming its effectiveness in leveraging crowdsourced annotations to achieve more effective and robust toxicity classification.
Researcher Affiliation Academia Zelei Cheng Northwestern University Evanston, USA zelei.cheng@northwestern.edu Xian Wu Northwestern University Evanston, USA xianwu2024@u.northwestern.edu Jiahao Yu Northwestern University Evanston, USA jiahao.yu@northwestern.edu Shuo Han Northwestern University Evanston, USA shuo.han.1@u.northwestern.edu Xin-Qiang Cai The University of Tokyo Tokyo, Japan xinqiang.cai@riken.jp Xinyu Xing Northwestern University Evanston, USA xinyu.xing@northwestern.edu
Pseudocode Yes We provide the full algorithm in Algorithm 1.
Open Source Code Yes We release the data and code in https://github. com/chengzelei/crowdsource_toxicity_classification.
Open Datasets Yes Additionally, we conduct our experiments on the public Hate Xplain dataset [20].
Dataset Splits Yes For each classification task, we have a large training set with crowdsourced annotations (i.e., 6,941 samples for toxic question classification and 28,194 samples for toxic response classification) and a testing set containing 2,000 samples with ground truth. The validation set with ground truth includes a small number of samples (i.e., 1,000 samples) from the training set.
Hardware Specification Yes We train the machine learning models on a server with 8 NVIDIA A100 80GB GPUs and 4TB memory for all the learning algorithms.
Software Dependencies Yes The toxicity classifier and soft-label weight estimator are both implemented based on the transformers library of version 4.34.1 [62].
Experiment Setup Yes We list the hyper-parameter settings for all experiments in Appendix C.3.