reproducibilityindex.ai

Controlled Text Generation as Continuous Optimization with Multiple Constraints

Authors: Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on controllable machine translation and style transfer with multiple sentence-level attributes and observe signiﬁcant improvements over baselines.
Researcher Affiliation	Collaboration	Sachin Kumar Eric Malmi Aliaksei Severyn Yulia Tsvetkov Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA Google Research Paul G. Allen School of Computer Science & Engineering, University of Washington sachink@cs.cmu.edu, {emalmi, severyn}@google.com, yuliats@cs.washington.edu
Pseudocode	Yes	The ﬁnal decoding algorithm we use in all our experiments is described in the Appendix algorithm 1.
Open Source Code	Yes	1The code is available at https://github.com/Sachin19/mucoco
Open Datasets	Yes	We train this model by ﬁne-tuning GPT2 (345M) [51] with the GYAFC Corpus (Entertainment/Music domain; around 50K formal sentences) [53] and evaluate it on the provided test set containing 1312 informal sentences.
Dataset Splits	Yes	We evaluate it on the provided test set containing 1312 informal sentences. We use exponentiated descent learning rate of η1 = 50 for y and ascent learning rate of η2 = 2.0 for the multipliers, and run the optimization for 100 steps. For each constraint, we use the following annealing schedule: we start with an initial value and linearly decrease it at step 40 until it reaches the desired value at step 80, after which we keep it constant.
Hardware Specification	Yes	For example, on a single Ge Force RTX 2080 Ti (12GB) on which we run all presented experiments, with a batch size of 1, our approach takes approximately 90 minutes on average to decode around 1200 sentences compared to around 20 minutes for FUDGE [72] with a single constraint.
Software Dependencies	No	The paper mentions using 'Hugging Face [70]' and 'Marian Transformer based French (fr) to English (en) model [24] through Huggingface' but does not provide specific version numbers for these or other software dependencies like Python or PyTorch.
Experiment Setup	Yes	For a given sentence length T, we initialize each simplex y1, . . . , y T uniformly over the vocabulary. We use exponentiated descent learning rate of η1 = 50 for y and ascent learning rate of η2 = 2.0 for the multipliers, and run the optimization for 100 steps.