Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AffectiveTweets: a Weka Package for Analyzing Affect in Tweets

Authors: Felipe Bravo-Marquez, Eibe Frank, Bernhard Pfahringer, Saif M. Mohammad

JMLR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For demonstration, we benchmark Aﬀective Tweets against similar and equivalent models built using the NLTK sentiment analysis module and Scikit-learn (Pedregosa et al., 2011) on the dataset from the Sem Eval 2013 Sentiment Analysis in Twitter Message Polarity Classiﬁcation task (Nakov et al., 2013). Classiﬁcation results on the testing partition and execution times are shown in Table 1.
Researcher Affiliation	Academia	Felipe Bravo-Marquez EMAIL Department of Computer Science, University of Chile & IMFD, Santiago, Chile Eibe Frank EMAIL Bernhard Pfahringer EMAIL Department of Computer Science, University of Waikato, Hamilton, New Zealand Saif M. Mohammad EMAIL National Research Council Canada, Ottawa, ON, Canada
Pseudocode	No	The paper describes the functionalities of the package as Weka filters and explains how they work (e.g., 'The Tweet To Sparse Feature Vector ﬁlter calculates several sparse features for every tweet'), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The software is implemented as a Weka2 package that can be installed with the Weka package manager. It can be accessed through Weka s GUIs or the command line interface. It is licensed under the GNU General Public License, Version 3 and hosted on Github.3
Open Datasets	Yes	The package was used by several teams in the shared tasks: Emo Int 2017 and Aﬀect in Tweets Sem Eval 2018 Task 1. ... This list includes AFINN ( Arup Nielsen, 2011), the Sentiment140 lexicon (Kiritchenko et al., 2014), and others. ... For demonstration, we benchmark Aﬀective Tweets against similar and equivalent models built using the NLTK sentiment analysis module and Scikit-learn (Pedregosa et al., 2011) on the dataset from the Sem Eval 2013 Sentiment Analysis in Twitter Message Polarity Classiﬁcation task (Nakov et al., 2013).
Dataset Splits	Yes	For demonstration, we benchmark Aﬀective Tweets against similar and equivalent models built using the NLTK sentiment analysis module and Scikit-learn (Pedregosa et al., 2011) on the dataset from the Sem Eval 2013 Sentiment Analysis in Twitter Message Polarity Classiﬁcation task (Nakov et al., 2013). Classiﬁcation results on the testing partition and execution times are shown in Table 1.
Hardware Specification	No	The paper reports execution times in Table 1, but it does not provide any specific hardware details such as CPU, GPU, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions several software packages and libraries used, such as 'Weka (Hall et al., 2009)', 'Weka Deeplearning4j package (Lang et al., 2019)', 'Tweet NLP library (Gimpel et al., 2011)', 'NLTK (Bird and Loper, 2004)', and 'Scikit-learn (Pedregosa et al., 2011)'. However, it does not provide specific version numbers for these components.
Experiment Setup	No	Table 1 states 'Each model consists of a logistic regression trained on the corresponding features.' However, the paper does not provide specific hyperparameters for the logistic regression or other detailed training configurations such as learning rates, batch sizes, or number of epochs.