Controlled Text Generation as Continuous Optimization with Multiple Constraints

Authors: Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on controllable machine translation and style transfer with multiple sentence-level attributes and observe significant improvements over baselines.
Researcher Affiliation Collaboration Sachin Kumar Eric Malmi Aliaksei Severyn Yulia Tsvetkov Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA Google Research Paul G. Allen School of Computer Science & Engineering, University of Washington sachink@cs.cmu.edu, {emalmi, severyn}@google.com, yuliats@cs.washington.edu
Pseudocode Yes The final decoding algorithm we use in all our experiments is described in the Appendix algorithm 1.
Open Source Code Yes 1The code is available at https://github.com/Sachin19/mucoco
Open Datasets Yes We train this model by fine-tuning GPT2 (345M) [51] with the GYAFC Corpus (Entertainment/Music domain; around 50K formal sentences) [53] and evaluate it on the provided test set containing 1312 informal sentences.
Dataset Splits Yes We evaluate it on the provided test set containing 1312 informal sentences. We use exponentiated descent learning rate of η1 = 50 for y and ascent learning rate of η2 = 2.0 for the multipliers, and run the optimization for 100 steps. For each constraint, we use the following annealing schedule: we start with an initial value and linearly decrease it at step 40 until it reaches the desired value at step 80, after which we keep it constant.
Hardware Specification Yes For example, on a single Ge Force RTX 2080 Ti (12GB) on which we run all presented experiments, with a batch size of 1, our approach takes approximately 90 minutes on average to decode around 1200 sentences compared to around 20 minutes for FUDGE [72] with a single constraint.
Software Dependencies No The paper mentions using 'Hugging Face [70]' and 'Marian Transformer based French (fr) to English (en) model [24] through Huggingface' but does not provide specific version numbers for these or other software dependencies like Python or PyTorch.
Experiment Setup Yes For a given sentence length T, we initialize each simplex y1, . . . , y T uniformly over the vocabulary. We use exponentiated descent learning rate of η1 = 50 for y and ascent learning rate of η2 = 2.0 for the multipliers, and run the optimization for 100 steps.