Systematic Rectification of Language Models via Dead-end Analysis
Authors: Meng Cao, Mehdi Fatemi, Jackie CK Cheung, Samira Shabanian
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on the REALTOXICITYPROMPTS benchmark. We demonstrate that our method can substantially mitigate toxicity using both automatic and human evaluation. Compared with the regular GPT-2 XL, our method yields a relative reduction in toxicity probability by 78% (83.2% 18.5%, as measured by PERSPECTIVE API), and it outperforms eight detoxification baselines. 2 |
| Researcher Affiliation | Collaboration | 1Mila Qu ebec AI Institute & Mc Gill University 2Microsoft Research |
| Pseudocode | No | The paper describes algorithms and formal methods but does not provide pseudocode or an algorithm block. |
| Open Source Code | Yes | https://github.com/mcao516/rectification-lm.git |
| Open Datasets | Yes | We evaluate our method on the REALTOXICITYPROMPTS benchmark. We used the TOXIC COMMENT CLASSIFICATION CHALLENGE dataset 7 as our reward model. We utilized the CIVILCOMMENT dataset (Borkan et al., 2019) for QD training. |
| Dataset Splits | Yes | We use 90% of the rest samples for the reward model training and 10% for validation. |
| Hardware Specification | No | The paper mentions using GPT-2, GPT-2 XL, and GPT-3 (DA VINCI-002 model via OpenAI API) but does not specify the underlying hardware (e.g., GPU models, CPU specifications) used for running the experiments or training. |
| Software Dependencies | No | All GPT-2 and GPT-2 XL experiments are carried out with the Hugging Face Transformers library. For GPT-3, we use the DA VINCI-002 model in the Open AI API. We initialize the reward model using BERT (specifically, BERT-base-uncased). The paper does not specify version numbers for these software components. |
| Experiment Setup | Yes | We train the model for 3 epochs with a batch size equal to 32. We use the Adam W algorithm (Loshchilov & Hutter, 2019) with learning rate is set to 2e 5, Adam beta weights of β1 = 0.9, β2 = 0.999, Adam epsilon of 1e 8, and weight decay of 0.01. We decay the learning rate linearly during training. Table 7 shows the hyperparameters we used for QD network training. (number of episodes 900K, gamma 1.0, optimizer Adam W, β1, β2 0.9, 0.999, Adam weight decay 0.01, Adam ϵ 1e 8, learning rate 3e 4, Polyak s learning rate 0.5, max length 128, batch size 8, warm-up steps 500) |