Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Leashing the Inner Demons: Self-Detoxification for Language Models
Authors: Canwen Xu, Zexue He, Zhankui He, Julian McAuley11530-11537
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we conduct extensive experiments to study this phenomenon. We analyze the impact of prompts, decoding strategies and training corpora on the output toxicity. |
| Researcher Affiliation | Academia | University of California, San Diego EMAIL |
| Pseudocode | No | The paper describes the methodology in prose and with a workflow diagram (Figure 1), but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions its implementation is based on Hugging Face Transformers but does not provide a direct link or explicit statement for the availability of its own source code. |
| Open Datasets | Yes | We sample 5,000 prompts from Writing Prompts (Fan, Lewis, and Dauphin 2018). ... we use 5,000 prompts associated with the highest toxicity from Real Toxic Prompts (Gehman et al. 2020). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It mentions using 5,000 prompts for generation and then fine-tuning on the generated text, but no specific validation split for the fine-tuning or evaluation is stated. |
| Hardware Specification | Yes | We generate text on an Nvidia V100, requiring around 12h to generate 5,000 samples. |
| Software Dependencies | No | The paper states 'Our implementation is based on Hugging Face Transformers (Wolf et al. 2020)' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | The maximum generation length is set to 200. The temperature is set to 1 for top-k, top-p, and beam search. |