reproducibilityindex.ai

Residual Energy-Based Models for Text Generation

Authors: Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on two large language modeling datasets show that residual EBMs yield lower perplexity compared to locally normalized baselines. Moreover, generation via importance sampling is very efﬁcient and of higher quality than the baseline models according to human evaluation.
Researcher Affiliation	Collaboration	Yuntian Deng1, Anton Bakhtin2, Myle Ott2, Arthur Szlam2, Marc Aurelio Ranzato2 Harvard University1 Facebook AI Research2
Pseudocode	Yes	The algorithm is shown in Algorithm 1, where we introduce an optional top-k constraint on the pretrained language model to improve the quality of samples in the set3. Without the top-k constraint, as the number of samples goes to inﬁnity, we would recover exact samples from the joint model distribution.
Open Source Code	No	The paper mentions using models from the Hugging Face repository and NVIDIA/apex but does not explicitly state that the authors' own source code for their methodology is made available or provide a link to it.
Open Datasets	Yes	We consider two datasets: the Toronto Book Corpus (Zhu et al., 2015; Kiros et al., 2015) and CC-News (Bakhtin et al., 2019).
Dataset Splits	Yes	Table 1: Validation and test perplexity on CC-News and Toronto Book Corpus. Figure 2: Left: PPL estimation for joint BIT-BASE on CC-News validation set as we vary the number of samples.
Hardware Specification	Yes	We train our models on 8 DGX nodes, each with 8 Nvidia V100s.
Software Dependencies	No	The paper mentions using the Hugging Face repository and NVIDIA/apex, but does not provide specific version numbers for these or other software components.
Experiment Setup	Yes	Detailed hyper-parameter settings can be found in Appendix A.3. (Optimization settings are presented in Table 4, including fp16 batch size, warmup steps, max steps, max lr, max grad norm).