A Distributional Approach to Controlled Text Generation

Authors: Muhammad Khalifa, Hady Elsahar, Marc Dymetman

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a first set of experiments over pointwise constraints showing the advantages of our approach over a set of baselines, in terms of obtaining a controlled LM balancing constraint satisfaction with divergence from the initial LM. We then perform experiments over distributional constraints, a unique feature of our approach, demonstrating its potential as a remedy to the problem of Bias in Language Models. Through an ablation study, we show the effectiveness of our adaptive technique for obtaining faster convergence.
Researcher Affiliation Collaboration Muhammad Khalifa Cairo University Hady Elsahar Naver Labs Europe Marc Dymetman Naver Labs Europe {hady.elsahar,marc.dymetman}@naverlabs.com m.khalifa@grad.fci-cu.edu.eg
Pseudocode Yes Algorithm 1 Computing λ
Open Source Code Yes Code available on https://github.com/naver/gdc
Open Datasets Yes For distributional and hybrid experiments, we fine-tune GPT-2 small (117M params) to produce biographies on a dataset of 700K Wikipedia biographies (Lebret et al., 2016) which we refer to as GPT-2bio.
Dataset Splits Yes We end up with a total of 4600 samples out of which we use 500 for validation and the rest for fine-tuning.
Hardware Specification Yes Each training required 2 Nvidia V100 GPUs, the longest model took 72 hours to train.
Software Dependencies No The paper mentions software like PyTorch and Hugging Face library, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes A list of the hyperparameters used for GDC and baselines is given in table 5. K refers to the number of gradient steps per iteration in Algorithm 2. N refers to the number of samples required and µtolerance to the minimum tolerated error || µ ˆµ(λ)||2 2 while optimizing λ, and λlearning is the SGD step size for updating λ in Algorithm 1.