reproducibilityindex.ai

Improving Neural Language Modeling via Adversarial Training

Authors: Dilin Wang, Chengyue Gong, Qiang Liu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.01 and 38.65, respectively. ... We demonstrate the effectiveness of our method in two applications: neural language modeling and neural machine translation, and compare them with state-of-the-art architectures and learning methods.
Researcher Affiliation	Academia	1Department of Computer Science, UT Austin. Correspondence to: Dilin Wang <dilin@cs.utexas.edu>, Chengyue Gong <cygong@cs.utexas.edu>.
Pseudocode	Yes	Algorithm 1 Adversarial MLE Training
Open Source Code	Yes	Our code is available at: https://github. com/Chengyue Gong R/advsoft.
Open Datasets	Yes	We test our method on three benchmark datasets: Penn Treebank (PTB), Wikitext-2 (WT2) and Wikitext-103 (WT103). ... The PTB corpus (Marcus et al., 1993) has been a standard dataset used for benchmarking language models.
Dataset Splits	Yes	PTB The PTB corpus (Marcus et al., 1993) has been a standard dataset used for benchmarking language models. It consists of 923k training, 73k validation and 82k test words.
Hardware Specification	No	The paper mentions 'GPUs' in a general sense but does not provide specific hardware details such as exact GPU or CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions using 'Tensor2Tensor (Vaswani et al., 2018)' for implementation but does not specify version numbers for this or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	We set α = 0.005 for the rest of experiments unless otherwise speciﬁed. ... For Transformer-Small, we stack a 4-layer encoder and a 4-layer decoder with 256dimensional hidden units per layer. For Transformer-Base, we set the batch size to 6400 and the dropout rate to 0.4 following Wang et al. (2019).