QUARK: Controllable Text Generation with Reinforced Unlearning

Authors: Ximing Lu, Sean Welleck, Jack Hessel, Liwei Jiang, Lianhui Qin, Peter West, Prithviraj Ammanabrolu, Yejin Choi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO [66], while relying only on standard language modeling primitives.
Researcher Affiliation Collaboration Allen Institute for Artificial Intelligence Paul G. Allen School of Computer Science, University of Washington {ximinglu, jackh, raja}@allenai.org {wellecks, lwjiang, lianhuiq, pawest, yejin}@cs.washington.edu
Pseudocode Yes Algorithm 1 Quantized Reward Konditioning (Quark)
Open Source Code No We will release the code for Quark at https://github.com/GXiming Lu/Quark prior to Neur IPS 2022.
Open Datasets Yes REALTOXICITYPROMPTS benchmark, WRITINGPROMPTS dataset [15], Open Web Text Corpus (OWT) [19], SST-2 dataset[70], WIKITEXT-103 [44]
Dataset Splits No The paper specifies train and test sets (e.g., "85K prompts from the train set; for evaluation, we use the same 10K non-toxic test prompts"), but it does not provide explicit details (e.g., size or percentages) for a dedicated validation split used for model tuning or early stopping, although 'val set' is referenced in figures.
Hardware Specification No The paper mentions 'Google Cloud Compute' and 'computational resource constraints' but does not specify any particular CPU models, GPU models, or detailed cloud instance types used for experiments.
Software Dependencies No The paper mentions software like Adam [31], PyTorch [53], Hugging Face [81], and Distill BERT [62] but does not provide specific version numbers for these software dependencies required for reproducibility.
Experiment Setup Yes During training, we use 85K prompts from the train set... We use K = 5 quantiles. During the exploration phase... we mix greedy decoding and nucleus sampling in a 50%-50% proportion... We use K = 8 quantiles.