Residual Energy-Based Models for Text Generation

Authors: Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on two large language modeling datasets show that residual EBMs yield lower perplexity compared to locally normalized baselines. Moreover, generation via importance sampling is very efficient and of higher quality than the baseline models according to human evaluation.
Researcher Affiliation Collaboration Yuntian Deng1, Anton Bakhtin2, Myle Ott2, Arthur Szlam2, Marc Aurelio Ranzato2 Harvard University1 Facebook AI Research2
Pseudocode Yes The algorithm is shown in Algorithm 1, where we introduce an optional top-k constraint on the pretrained language model to improve the quality of samples in the set3. Without the top-k constraint, as the number of samples goes to infinity, we would recover exact samples from the joint model distribution.
Open Source Code No The paper mentions using models from the Hugging Face repository and NVIDIA/apex but does not explicitly state that the authors' own source code for their methodology is made available or provide a link to it.
Open Datasets Yes We consider two datasets: the Toronto Book Corpus (Zhu et al., 2015; Kiros et al., 2015) and CC-News (Bakhtin et al., 2019).
Dataset Splits Yes Table 1: Validation and test perplexity on CC-News and Toronto Book Corpus. Figure 2: Left: PPL estimation for joint BIT-BASE on CC-News validation set as we vary the number of samples.
Hardware Specification Yes We train our models on 8 DGX nodes, each with 8 Nvidia V100s.
Software Dependencies No The paper mentions using the Hugging Face repository and NVIDIA/apex, but does not provide specific version numbers for these or other software components.
Experiment Setup Yes Detailed hyper-parameter settings can be found in Appendix A.3. (Optimization settings are presented in Table 4, including fp16 batch size, warmup steps, max steps, max lr, max grad norm).