reproducibilityindex.ai

QUARK: Controllable Text Generation with Reinforced Unlearning

Authors: Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin Choi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO [66], while relying only on standard language modeling primitives.
Researcher Affiliation	Collaboration	Allen Institute for Artificial Intelligence Paul G. Allen School of Computer Science, University of Washington {ximinglu, jackh, raja}@allenai.org {wellecks, lwjiang, lianhuiq, pawest, yejin}@cs.washington.edu
Pseudocode	Yes	Algorithm 1 Quantized Reward Konditioning (Quark)
Open Source Code	No	We will release the code for Quark at https://github.com/GXiming Lu/Quark prior to Neur IPS 2022.
Open Datasets	Yes	REALTOXICITYPROMPTS benchmark, WRITINGPROMPTS dataset [15], Open Web Text Corpus (OWT) [19], SST-2 dataset[70], WIKITEXT-103 [44]
Dataset Splits	No	The paper specifies train and test sets (e.g., "85K prompts from the train set; for evaluation, we use the same 10K non-toxic test prompts"), but it does not provide explicit details (e.g., size or percentages) for a dedicated validation split used for model tuning or early stopping, although 'val set' is referenced in figures.
Hardware Specification	No	The paper mentions 'Google Cloud Compute' and 'computational resource constraints' but does not specify any particular CPU models, GPU models, or detailed cloud instance types used for experiments.
Software Dependencies	No	The paper mentions software like Adam [31], PyTorch [53], Hugging Face [81], and Distill BERT [62] but does not provide specific version numbers for these software dependencies required for reproducibility.
Experiment Setup	Yes	During training, we use 85K prompts from the train set... We use K = 5 quantiles. During the exploration phase... we mix greedy decoding and nucleus sampling in a 50%-50% proportion... We use K = 8 quantiles.