Systematic Rectification of Language Models via Dead-end Analysis

Authors: Meng Cao, Mehdi Fatemi, Jackie CK Cheung, Samira Shabanian

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on the REALTOXICITYPROMPTS benchmark. We demonstrate that our method can substantially mitigate toxicity using both automatic and human evaluation. Compared with the regular GPT-2 XL, our method yields a relative reduction in toxicity probability by 78% (83.2% 18.5%, as measured by PERSPECTIVE API), and it outperforms eight detoxification baselines. 2
Researcher Affiliation Collaboration 1Mila Qu ebec AI Institute & Mc Gill University 2Microsoft Research
Pseudocode No The paper describes algorithms and formal methods but does not provide pseudocode or an algorithm block.
Open Source Code Yes https://github.com/mcao516/rectification-lm.git
Open Datasets Yes We evaluate our method on the REALTOXICITYPROMPTS benchmark. We used the TOXIC COMMENT CLASSIFICATION CHALLENGE dataset 7 as our reward model. We utilized the CIVILCOMMENT dataset (Borkan et al., 2019) for QD training.
Dataset Splits Yes We use 90% of the rest samples for the reward model training and 10% for validation.
Hardware Specification No The paper mentions using GPT-2, GPT-2 XL, and GPT-3 (DA VINCI-002 model via OpenAI API) but does not specify the underlying hardware (e.g., GPU models, CPU specifications) used for running the experiments or training.
Software Dependencies No All GPT-2 and GPT-2 XL experiments are carried out with the Hugging Face Transformers library. For GPT-3, we use the DA VINCI-002 model in the Open AI API. We initialize the reward model using BERT (specifically, BERT-base-uncased). The paper does not specify version numbers for these software components.
Experiment Setup Yes We train the model for 3 epochs with a batch size equal to 32. We use the Adam W algorithm (Loshchilov & Hutter, 2019) with learning rate is set to 2e 5, Adam beta weights of β1 = 0.9, β2 = 0.999, Adam epsilon of 1e 8, and weight decay of 0.01. We decay the learning rate linearly during training. Table 7 shows the hyperparameters we used for QD network training. (number of episodes 900K, gamma 1.0, optimizer Adam W, β1, β2 0.9, 0.999, Adam weight decay 0.01, Adam ϵ 1e 8, learning rate 3e 4, Polyak s learning rate 0.5, max length 128, batch size 8, warm-up steps 500)