Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Defending Against Neural Fake News

Authors: Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We ﬁnd that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data. Counterintuitively, the best defense against Grover turns out to be Grover itself, with 92% accuracy, demonstrating the importance of public release of strong generators. We investigate these results further, showing that exposure bias and sampling strategies that alleviate its e ects both leave artifacts that similar discriminators can pick up on.
Researcher Affiliation	Collaboration	Rowan Zellers , Ari Holtzman , Hannah Rashkin , Yonatan Bisk Ali Farhadi ~, Franziska Roesner , Yejin Choi ~ Paul G. Allen School of Computer Science & Engineering, University of Washington ~Allen Institute for Artiﬁcial Intelligence
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We conclude by discussing ethical issues regarding the technology, and plan to release Grover publicly, helping pave the way for better detection of neural fake news. We thus released our models to researchers (Zellers, 2019). https://rowanzellers.com/grover
Open Datasets	No	We present Real News, a large corpus of news articles from Common Crawl. Training Grover requires a large corpus of news articles with metadata, but none currently exists. Thus, we construct one by scraping dumps from Common Crawl, limiting ourselves to the 5000 news domains indexed by Google News. We used the Newspaper Python library to extract the body and metadata from each article. News from Common Crawl dumps from December 2016 through March 2019 were used as training data; articles published in April 2019 from the April 2019 dump were used for evaluation. After deduplication, Real News is 120 gigabytes without compression. The paper describes the creation of a custom dataset but does not provide a specific link, DOI, or repository for public access to their curated Real News dataset.
Dataset Splits	Yes	Using 10k news articles from April 2019, we generate article body text; another 10k articles are used as a set of human-written news articles. We split the articles in a balanced way, with 10k for training (5k per label), 2k for validation, and 8k for testing.
Hardware Specification	Yes	We trained Grover-Mega for 800k iterations, using a batch size of 512 and 256 TPU v3 cores. Training time was two weeks.
Software Dependencies	No	The paper mentions using a "Newspaper Python library" and building Grover with the "same architecture as for GPT2" but does not specify version numbers for Python, libraries like PyTorch or TensorFlow, or other software components crucial for replication.
Experiment Setup	Yes	Other optimization hyperparameters are in Appendix A. We trained Grover-Mega for 800k iterations, using a batch size of 512 and 256 TPU v3 cores. Training time was two weeks.