reproducibilityindex.ai

A Distributional Approach to Controlled Text Generation

Authors: Muhammad Khalifa, Hady Elsahar, Marc Dymetman

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a ﬁrst set of experiments over pointwise constraints showing the advantages of our approach over a set of baselines, in terms of obtaining a controlled LM balancing constraint satisfaction with divergence from the initial LM. We then perform experiments over distributional constraints, a unique feature of our approach, demonstrating its potential as a remedy to the problem of Bias in Language Models. Through an ablation study, we show the effectiveness of our adaptive technique for obtaining faster convergence.
Researcher Affiliation	Collaboration	Muhammad Khalifa Cairo University Hady Elsahar Naver Labs Europe Marc Dymetman Naver Labs Europe {hady.elsahar,marc.dymetman}@naverlabs.com m.khalifa@grad.fci-cu.edu.eg
Pseudocode	Yes	Algorithm 1 Computing λ
Open Source Code	Yes	Code available on https://github.com/naver/gdc
Open Datasets	Yes	For distributional and hybrid experiments, we ﬁne-tune GPT-2 small (117M params) to produce biographies on a dataset of 700K Wikipedia biographies (Lebret et al., 2016) which we refer to as GPT-2bio.
Dataset Splits	Yes	We end up with a total of 4600 samples out of which we use 500 for validation and the rest for ﬁne-tuning.
Hardware Specification	Yes	Each training required 2 Nvidia V100 GPUs, the longest model took 72 hours to train.
Software Dependencies	No	The paper mentions software like PyTorch and Hugging Face library, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	A list of the hyperparameters used for GDC and baselines is given in table 5. K refers to the number of gradient steps per iteration in Algorithm 2. N refers to the number of samples required and µtolerance to the minimum tolerated error \|\| µ ˆµ(λ)\|\|2 2 while optimizing λ, and λlearning is the SGD step size for updating λ in Algorithm 1.