Defending Against Neural Fake News
Authors: Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data. Counterintuitively, the best defense against Grover turns out to be Grover itself, with 92% accuracy, demonstrating the importance of public release of strong generators. We investigate these results further, showing that exposure bias and sampling strategies that alleviate its e ects both leave artifacts that similar discriminators can pick up on. |
| Researcher Affiliation | Collaboration | Rowan Zellers , Ari Holtzman , Hannah Rashkin , Yonatan Bisk Ali Farhadi ~, Franziska Roesner , Yejin Choi ~ Paul G. Allen School of Computer Science & Engineering, University of Washington ~Allen Institute for Artificial Intelligence |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We conclude by discussing ethical issues regarding the technology, and plan to release Grover publicly, helping pave the way for better detection of neural fake news. We thus released our models to researchers (Zellers, 2019). https://rowanzellers.com/grover |
| Open Datasets | No | We present Real News, a large corpus of news articles from Common Crawl. Training Grover requires a large corpus of news articles with metadata, but none currently exists. Thus, we construct one by scraping dumps from Common Crawl, limiting ourselves to the 5000 news domains indexed by Google News. We used the Newspaper Python library to extract the body and metadata from each article. News from Common Crawl dumps from December 2016 through March 2019 were used as training data; articles published in April 2019 from the April 2019 dump were used for evaluation. After deduplication, Real News is 120 gigabytes without compression. The paper describes the creation of a custom dataset but does not provide a specific link, DOI, or repository for public access to their curated Real News dataset. |
| Dataset Splits | Yes | Using 10k news articles from April 2019, we generate article body text; another 10k articles are used as a set of human-written news articles. We split the articles in a balanced way, with 10k for training (5k per label), 2k for validation, and 8k for testing. |
| Hardware Specification | Yes | We trained Grover-Mega for 800k iterations, using a batch size of 512 and 256 TPU v3 cores. Training time was two weeks. |
| Software Dependencies | No | The paper mentions using a "Newspaper Python library" and building Grover with the "same architecture as for GPT2" but does not specify version numbers for Python, libraries like PyTorch or TensorFlow, or other software components crucial for replication. |
| Experiment Setup | Yes | Other optimization hyperparameters are in Appendix A. We trained Grover-Mega for 800k iterations, using a batch size of 512 and 256 TPU v3 cores. Training time was two weeks. |