Residual Energy-Based Models for Text Generation
Authors: Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on two large language modeling datasets show that residual EBMs yield lower perplexity compared to locally normalized baselines. Moreover, generation via importance sampling is very efficient and of higher quality than the baseline models according to human evaluation. |
| Researcher Affiliation | Collaboration | Yuntian Deng1, Anton Bakhtin2, Myle Ott2, Arthur Szlam2, Marc Aurelio Ranzato2 Harvard University1 Facebook AI Research2 |
| Pseudocode | Yes | The algorithm is shown in Algorithm 1, where we introduce an optional top-k constraint on the pretrained language model to improve the quality of samples in the set3. Without the top-k constraint, as the number of samples goes to infinity, we would recover exact samples from the joint model distribution. |
| Open Source Code | No | The paper mentions using models from the Hugging Face repository and NVIDIA/apex but does not explicitly state that the authors' own source code for their methodology is made available or provide a link to it. |
| Open Datasets | Yes | We consider two datasets: the Toronto Book Corpus (Zhu et al., 2015; Kiros et al., 2015) and CC-News (Bakhtin et al., 2019). |
| Dataset Splits | Yes | Table 1: Validation and test perplexity on CC-News and Toronto Book Corpus. Figure 2: Left: PPL estimation for joint BIT-BASE on CC-News validation set as we vary the number of samples. |
| Hardware Specification | Yes | We train our models on 8 DGX nodes, each with 8 Nvidia V100s. |
| Software Dependencies | No | The paper mentions using the Hugging Face repository and NVIDIA/apex, but does not provide specific version numbers for these or other software components. |
| Experiment Setup | Yes | Detailed hyper-parameter settings can be found in Appendix A.3. (Optimization settings are presented in Table 4, including fp16 batch size, warmup steps, max steps, max lr, max grad norm). |