Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sentiment Analysis of Short Informal Texts

Authors: S. Kiritchenko, X. Zhu, S. M. Mohammad

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We describe a state-of-the-art sentiment analysis system that detects... The system is based on a supervised statistical text classiﬁcation approach... The system ranked ﬁrst in the Sem Eval-2013 shared task Sentiment Analysis in Twitter (Task 2)... The ablation experiments demonstrate that the use of the automatically generated lexicons results in performance gains... Section 6 provides the results of the evaluation experiments.
Researcher Affiliation	Academia	Svetlana Kiritchenko EMAIL Xiaodan Zhu EMAIL Saif M. Mohammad EMAIL National Research Council Canada 1200 Montreal Rd., Ottawa, ON, Canada
Pseudocode	No	The paper only describes methods and calculations using regular prose and mathematical equations (e.g., equations 1, 2, 3), without any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states, 'All automatic lexicons described in the paper are made available to the research community.2' with footnote 2 providing 'www.purl.com/net/sentimentoftweets'. However, there is no explicit statement or link indicating that the source code for the overall sentiment analysis system methodology described in the paper is open-sourced. The paper only mentions that the system 'can be replicated using freely available resources' and acknowledges 'Colin Cherry for providing his SVM code', implying third-party tools rather than the authors' own implementation being released.
Open Datasets	Yes	In this work, we follow the deﬁnition of the task and use the data provided for the Sem Eval2013 competition: Sentiment Analysis in Twitter (Wilson et al., 2013). This competition had two tasks: a message-level task and a term-level task. The Sentiment140 Corpus (Go et al., 2009) is a collection of 1.6 million tweets that contain emoticons. In addition to the Sem Eval-2013 datasets, we evaluate the system on a dataset of movie review excerpts (Socher et al., 2013).
Dataset Splits	Yes	The training set was distributed through tweet ids and a download script. However, not all tweets were accessible. For example, a Twitter user could have deleted her messages, and thus these messages would not be available. Table 1 shows the number of the training examples we were able to download. The development and test sets were provided in full by FTP. The dataset is comprised of 4,963 positive and 4,650 negative sentences split into the training (6,920 sentences), development (872 sentences), and test (1,821 sentences) sets.
Hardware Specification	No	The paper mentions, 'We recently annotated 135 million tweets over a cluster of 50 machines in 11 hours,' but it does not specify any particular hardware details such as CPU/GPU models, memory, or other specific machine specifications.
Software Dependencies	No	The paper mentions using a 'linear-kernel Support Vector Machine (SVM) (Chang & Lin, 2011) classiﬁer', the 'CMU Twitter NLP tool (Gimpel et al., 2011)', and the 'Porter stemmer (Porter, 1980)'. However, it does not provide specific version numbers for any of these software components or libraries, which is required for reproducibility.
Experiment Setup	No	The paper describes the choice of a 'linear-kernel Support Vector Machine (SVM)' classiﬁer and the feature sets used. However, it does not explicitly provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, regularization parameters for the SVM), optimization algorithm settings, or model initialization details that would be necessary for reproduction.