reproducibilityindex.ai

Leashing the Inner Demons: Self-Detoxification for Language Models

Authors: Canwen Xu, Zexue He, Zhankui He, Julian McAuley11530-11537

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we conduct extensive experiments to study this phenomenon. We analyze the impact of prompts, decoding strategies and training corpora on the output toxicity.
Researcher Affiliation	Academia	University of California, San Diego {cxu,zehe,zhh004,jmcauley}@ucsd.edu
Pseudocode	No	The paper describes the methodology in prose and with a workflow diagram (Figure 1), but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions its implementation is based on Hugging Face Transformers but does not provide a direct link or explicit statement for the availability of its own source code.
Open Datasets	Yes	We sample 5,000 prompts from Writing Prompts (Fan, Lewis, and Dauphin 2018). ... we use 5,000 prompts associated with the highest toxicity from Real Toxic Prompts (Gehman et al. 2020).
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits. It mentions using 5,000 prompts for generation and then fine-tuning on the generated text, but no specific validation split for the fine-tuning or evaluation is stated.
Hardware Specification	Yes	We generate text on an Nvidia V100, requiring around 12h to generate 5,000 samples.
Software Dependencies	No	The paper states 'Our implementation is based on Hugging Face Transformers (Wolf et al. 2020)' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup	Yes	The maximum generation length is set to 200. The temperature is set to 1 for top-k, top-p, and beam search.