Controlled Text Generation as Continuous Optimization with Multiple Constraints
Authors: Sachin Kumar, Eric Malmi, Aliaksei Severyn, Yulia Tsvetkov
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on controllable machine translation and style transfer with multiple sentence-level attributes and observe significant improvements over baselines. |
| Researcher Affiliation | Collaboration | Sachin Kumar Eric Malmi Aliaksei Severyn Yulia Tsvetkov Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, USA Google Research Paul G. Allen School of Computer Science & Engineering, University of Washington sachink@cs.cmu.edu, {emalmi, severyn}@google.com, yuliats@cs.washington.edu |
| Pseudocode | Yes | The final decoding algorithm we use in all our experiments is described in the Appendix algorithm 1. |
| Open Source Code | Yes | 1The code is available at https://github.com/Sachin19/mucoco |
| Open Datasets | Yes | We train this model by fine-tuning GPT2 (345M) [51] with the GYAFC Corpus (Entertainment/Music domain; around 50K formal sentences) [53] and evaluate it on the provided test set containing 1312 informal sentences. |
| Dataset Splits | Yes | We evaluate it on the provided test set containing 1312 informal sentences. We use exponentiated descent learning rate of η1 = 50 for y and ascent learning rate of η2 = 2.0 for the multipliers, and run the optimization for 100 steps. For each constraint, we use the following annealing schedule: we start with an initial value and linearly decrease it at step 40 until it reaches the desired value at step 80, after which we keep it constant. |
| Hardware Specification | Yes | For example, on a single Ge Force RTX 2080 Ti (12GB) on which we run all presented experiments, with a batch size of 1, our approach takes approximately 90 minutes on average to decode around 1200 sentences compared to around 20 minutes for FUDGE [72] with a single constraint. |
| Software Dependencies | No | The paper mentions using 'Hugging Face [70]' and 'Marian Transformer based French (fr) to English (en) model [24] through Huggingface' but does not provide specific version numbers for these or other software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | For a given sentence length T, we initialize each simplex y1, . . . , y T uniformly over the vocabulary. We use exponentiated descent learning rate of η1 = 50 for y and ascent learning rate of η2 = 2.0 for the multipliers, and run the optimization for 100 steps. |