reproducibilityindex.ai

Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning

Authors: Jaehyung Kim, Jinwoo Shin, Dongyeop Kang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through the extensive experiments on ten text classification datasets, we demonstrate that the proposed auxiliary preference learning via P2C on them is effective in improving text classifiers.
Researcher Affiliation	Academia	Jaehyung Kim 1 2 Jinwoo Shin 1 Dongyeop Kang 3 1KAIST 2Work was mainly done while visiting Minnesota NLP. 3University of Minnesota.
Pseudocode	Yes	Algorithm 1 Prefer-to-Classify (P2C) with extractive preference labels; Algorithm 2 Prefer-to-Classify (P2C) with subjective/generative preference labels
Open Source Code	Yes	Our codes are publicly available.1 1https://github.com/minnesotanlp/p2c
Open Datasets	Yes	Co LA (Warstadt et al., 2019), SMS Spam (Almeida et al., 2011), Hate Speech (Fiˇser et al., 2018), Emotion (Saravia et al., 2018), Dyna Sent (Potts et al., 2021), Standford politeness corpus (Danescu-Niculescu-Mizil et al., 2013), Offensive agreement dataset (Leonardelli et al., 2021), Multi NLI (Williams et al., 2018), SUN Attribute dataset (Patterson et al., 2014)
Dataset Splits	Yes	Dyna Sent-R1 comprises 80,488 training samples, 3,600 validation samples, and 3,600 test samples, respectively. Dyna Sent-R2 comprises 13,065 training samples, 720 validation samples, and 720 test samples. We split each dataset into an 8:1:1 ratio to construct training, validation, and test datasets. This re-constructed dataset has 2,400 training samples, 400 validation samples, and 400 test samples. This re-constructed dataset has 15,717 training samples, 1,964 validation samples, and 1,966 test samples.
Hardware Specification	No	The paper does not explicitly describe the hardware used for experiments, such as specific GPU models, CPU models, or cloud computing instance types.
Software Dependencies	No	The paper mentions software components like 'RoBERTa-base' and 'Adam optimizer' but does not specify their version numbers or the versions of any underlying frameworks like PyTorch or TensorFlow.
Experiment Setup	Yes	All the experiments are conducted by fine-tuning Ro BERTa-base (Liu et al., 2019) using Adam optimizer (Kingma & Ba, 2015) with a fixed learning rate 1e-5 and the default hyper-parameters of Adam. For all datasets, the model is fine-tuned using the specified training method with batch size 16 for 20 epochs. We choose hyper-parameters from a fixed set of candidates based on the validation set: λcons, λdiv {1.0, 0.1}.