Capturing the Style of Fake News
Authors: Piotr Przybyla490-497
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this study we aim to explore automatic methods that can detect online documents of low credibility, especially fake news, based on the style they are written in. We show that general-purpose text classifiers, despite seemingly good performance when evaluated simplistically, in fact overfit to sources of documents in training data. In order to achieve a truly style-based prediction, we gather a corpus of 103,219 documents from 223 online sources labelled by media experts, devise realistic evaluation scenarios and design two new classifiers: a neural network and a model based on stylometric features. The evaluation shows that the proposed classifiers maintain high accuracy in case of documents on previously unseen topics (e.g. new events) and from previously unseen sources (e.g. emerging news websites). |
| Researcher Affiliation | Academia | Piotr Przybyła Institute of Computer Science, Polish Academy of Sciences Warsaw, Poland piotr.przybyla@ipipan.waw.pl |
| Pseudocode | No | The paper describes the algorithms and models used (stylometric classifier, Bi LSTMAvg neural network, Bag of words, BERT) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | In order to encourage and facilitate further research, we make the corpus, the evaluation scenarios and the code (for the stylometric and neural classifiers) available online1. 1https://github.com/piotrmp/fakestyle |
| Open Datasets | Yes | In order to encourage and facilitate further research, we make the corpus, the evaluation scenarios and the code (for the stylometric and neural classifiers) available online1. 1https://github.com/piotrmp/fakestyle |
| Dataset Splits | Yes | The main evaluation procedure involves running the model construction and prediction in a 5-fold cross validation (CV) scenario and comparing its output to true labels. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions general computing environments indirectly like |
| Software Dependencies | No | The paper mentions several software components like Stanford Core NLP, Mallet, word2vec, glmnet package in R, TensorFlow, and BERT, but it does not specify concrete version numbers for these dependencies, which are necessary for full reproducibility. For example, it lists |
| Experiment Setup | Yes | The neural network is implemented and trained in Tensor Flow for 10 epochs with sentence length limited to 120 tokens and document length limited to 50 sentences. |