reproducibilityindex.ai

Improving Neural Language Generation with Spectrum Control

Authors: Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, Quanquan Gu

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our training framework with extensive experimental results on two tasks: language modeling and machine translation.
Researcher Affiliation	Collaboration	Lingxiao Wang1, Jing Huang2, Kevin Huang2, Ziniu Hu1, Guangtao Wang2, Quanquan Gu1 1Department of Computer Science, University of California, Los Angeles 2JD AI Research, Mountain View, CA 94034
Pseudocode	No	The paper describes its method using mathematical formulations and text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states that its implementation is 'based on the open-source code for AWD-LSTM' and 'based on the ofﬁcial code for Transformer-XL', and for MT 'based on the open-sourced code provided by Ott et al. (2018)'. These refer to the codebases of baseline models or frameworks, not explicitly to the authors' own implementation of the Spectrum Control method.
Open Datasets	Yes	We consider two benchmark datasets for language modeling: Wiki Text-2 and Wiki Text-103, which consist of pre-processed Wikipedia articles and were introduced by Merity et al. (2018a). ... We compare various NMT models on the IWSLT 2014 German English (De-En) and WMT 14 English German (En-De) datasets. For IWSLT 2014 De-En, we follow the same setup as in (Gehring et al., 2017). ... (Cettolo et al., 2014).
Dataset Splits	Yes	For IWSLT 2014 De-En, we follow the same setup as in (Gehring et al., 2017). More speciﬁcally, we have 160K sentence pairs as the training data, 7K sentence pairs as the validation data, and we combine tst2010, tst2011, tst2012, dev2010 and dev2012 datasets to form our test data.
Hardware Specification	Yes	For Wiki Text-2 dataset, we use one NVIDIA Tesla V100 GPU and set the batch size to be 80. For Wiki Text-103 dataset, we use four NVIDIA Tesla V100 GPU and set the batch size to be 40. For WMT 14, we use four NVIDIA Tesla V100 GPU and set the max token as 3500.
Software Dependencies	No	The paper mentions using 'AWD-LSTM model', 'Transformer-XL based models', and 'open-sourced code provided by Ott et al. (2018)' (fairseq), but does not specify version numbers for these software components or other dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	On the small Wiki Text-2 dataset, we implement our method based on the state-of-the-art AWD-LSTM model (Merity et al., 2018a). It is a 3-layer LSTM model with 1150 dimensional hidden states and 400 dimensional embeddings. ... For the parameters {λi}4 i=1 of the orthogonal regularizations, we tune them by grid search over {0.01, 0.1, 1, 10}. For the parameters λe, λp of the spectrum control, we tune them over the grid {0.1, 1, 10, 100}.