Leashing the Inner Demons: Self-Detoxification for Language Models
Authors: Canwen Xu, Zexue He, Zhankui He, Julian McAuley11530-11537
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct extensive experiments to study this phenomenon. We analyze the impact of prompts, decoding strategies and training corpora on the output toxicity. |
| Researcher Affiliation | Academia | University of California, San Diego {cxu,zehe,zhh004,jmcauley}@ucsd.edu |
| Pseudocode | No | The paper describes the methodology in prose and with a workflow diagram (Figure 1), but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions its implementation is based on Hugging Face Transformers but does not provide a direct link or explicit statement for the availability of its own source code. |
| Open Datasets | Yes | We sample 5,000 prompts from Writing Prompts (Fan, Lewis, and Dauphin 2018). ... we use 5,000 prompts associated with the highest toxicity from Real Toxic Prompts (Gehman et al. 2020). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It mentions using 5,000 prompts for generation and then fine-tuning on the generated text, but no specific validation split for the fine-tuning or evaluation is stated. |
| Hardware Specification | Yes | We generate text on an Nvidia V100, requiring around 12h to generate 5,000 samples. |
| Software Dependencies | No | The paper states 'Our implementation is based on Hugging Face Transformers (Wolf et al. 2020)' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | The maximum generation length is set to 200. The temperature is set to 1 for top-k, top-p, and beam search. |