Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Neural Language Modeling via Adversarial Training
Authors: Dilin Wang, Chengyue Gong, Qiang Liu
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.01 and 38.65, respectively. ... We demonstrate the effectiveness of our method in two applications: neural language modeling and neural machine translation, and compare them with state-of-the-art architectures and learning methods. |
| Researcher Affiliation | Academia | 1Department of Computer Science, UT Austin. Correspondence to: Dilin Wang <EMAIL>, Chengyue Gong <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Adversarial MLE Training |
| Open Source Code | Yes | Our code is available at: https://github. com/Chengyue Gong R/advsoft. |
| Open Datasets | Yes | We test our method on three benchmark datasets: Penn Treebank (PTB), Wikitext-2 (WT2) and Wikitext-103 (WT103). ... The PTB corpus (Marcus et al., 1993) has been a standard dataset used for benchmarking language models. |
| Dataset Splits | Yes | PTB The PTB corpus (Marcus et al., 1993) has been a standard dataset used for benchmarking language models. It consists of 923k training, 73k validation and 82k test words. |
| Hardware Specification | No | The paper mentions 'GPUs' in a general sense but does not provide specific hardware details such as exact GPU or CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions using 'Tensor2Tensor (Vaswani et al., 2018)' for implementation but does not specify version numbers for this or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | We set α = 0.005 for the rest of experiments unless otherwise specified. ... For Transformer-Small, we stack a 4-layer encoder and a 4-layer decoder with 256dimensional hidden units per layer. For Transformer-Base, we set the batch size to 6400 and the dropout rate to 0.4 following Wang et al. (2019). |