Improving Neural Language Generation with Spectrum Control
Authors: Lingxiao Wang, Jing Huang, Kevin Huang, Ziniu Hu, Guangtao Wang, Quanquan Gu
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our training framework with extensive experimental results on two tasks: language modeling and machine translation. |
| Researcher Affiliation | Collaboration | Lingxiao Wang1, Jing Huang2, Kevin Huang2, Ziniu Hu1, Guangtao Wang2, Quanquan Gu1 1Department of Computer Science, University of California, Los Angeles 2JD AI Research, Mountain View, CA 94034 |
| Pseudocode | No | The paper describes its method using mathematical formulations and text, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states that its implementation is 'based on the open-source code for AWD-LSTM' and 'based on the official code for Transformer-XL', and for MT 'based on the open-sourced code provided by Ott et al. (2018)'. These refer to the codebases of baseline models or frameworks, not explicitly to the authors' own implementation of the Spectrum Control method. |
| Open Datasets | Yes | We consider two benchmark datasets for language modeling: Wiki Text-2 and Wiki Text-103, which consist of pre-processed Wikipedia articles and were introduced by Merity et al. (2018a). ... We compare various NMT models on the IWSLT 2014 German English (De-En) and WMT 14 English German (En-De) datasets. For IWSLT 2014 De-En, we follow the same setup as in (Gehring et al., 2017). ... (Cettolo et al., 2014). |
| Dataset Splits | Yes | For IWSLT 2014 De-En, we follow the same setup as in (Gehring et al., 2017). More specifically, we have 160K sentence pairs as the training data, 7K sentence pairs as the validation data, and we combine tst2010, tst2011, tst2012, dev2010 and dev2012 datasets to form our test data. |
| Hardware Specification | Yes | For Wiki Text-2 dataset, we use one NVIDIA Tesla V100 GPU and set the batch size to be 80. For Wiki Text-103 dataset, we use four NVIDIA Tesla V100 GPU and set the batch size to be 40. For WMT 14, we use four NVIDIA Tesla V100 GPU and set the max token as 3500. |
| Software Dependencies | No | The paper mentions using 'AWD-LSTM model', 'Transformer-XL based models', and 'open-sourced code provided by Ott et al. (2018)' (fairseq), but does not specify version numbers for these software components or other dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | On the small Wiki Text-2 dataset, we implement our method based on the state-of-the-art AWD-LSTM model (Merity et al., 2018a). It is a 3-layer LSTM model with 1150 dimensional hidden states and 400 dimensional embeddings. ... For the parameters {λi}4 i=1 of the orthogonal regularizations, we tune them by grid search over {0.01, 0.1, 1, 10}. For the parameters λe, λp of the spectrum control, we tune them over the grid {0.1, 1, 10, 100}. |