reproducibilityindex.ai

Capturing the Style of Fake News

Authors: Piotr Przybyla490-497

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study we aim to explore automatic methods that can detect online documents of low credibility, especially fake news, based on the style they are written in. We show that general-purpose text classiﬁers, despite seemingly good performance when evaluated simplistically, in fact overﬁt to sources of documents in training data. In order to achieve a truly style-based prediction, we gather a corpus of 103,219 documents from 223 online sources labelled by media experts, devise realistic evaluation scenarios and design two new classiﬁers: a neural network and a model based on stylometric features. The evaluation shows that the proposed classiﬁers maintain high accuracy in case of documents on previously unseen topics (e.g. new events) and from previously unseen sources (e.g. emerging news websites).
Researcher Affiliation	Academia	Piotr Przybyła Institute of Computer Science, Polish Academy of Sciences Warsaw, Poland piotr.przybyla@ipipan.waw.pl
Pseudocode	No	The paper describes the algorithms and models used (stylometric classiﬁer, Bi LSTMAvg neural network, Bag of words, BERT) but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	In order to encourage and facilitate further research, we make the corpus, the evaluation scenarios and the code (for the stylometric and neural classiﬁers) available online1. 1https://github.com/piotrmp/fakestyle
Open Datasets	Yes	In order to encourage and facilitate further research, we make the corpus, the evaluation scenarios and the code (for the stylometric and neural classiﬁers) available online1. 1https://github.com/piotrmp/fakestyle
Dataset Splits	Yes	The main evaluation procedure involves running the model construction and prediction in a 5-fold cross validation (CV) scenario and comparing its output to true labels.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions general computing environments indirectly like
Software Dependencies	No	The paper mentions several software components like Stanford Core NLP, Mallet, word2vec, glmnet package in R, TensorFlow, and BERT, but it does not specify concrete version numbers for these dependencies, which are necessary for full reproducibility. For example, it lists
Experiment Setup	Yes	The neural network is implemented and trained in Tensor Flow for 10 epochs with sentence length limited to 120 tokens and document length limited to 50 sentences.