Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Predicting and Analyzing Language Specificity in Social Media Posts

Authors: Yifan Gao, Yang Zhong, Daniel Preoţiuc-Pietro, Junyi Jessy Li6415-6422

AAAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We collect a dataset of over 7,000 tweets annotated with speciﬁcity on a ﬁne-grained scale. Using this dataset, we train a supervised regression model that accurately estimates speciﬁcity in social media posts, reaching a mean absolute error of 0.3578 (for ratings on a scale of 1-5) and 0.73 Pearson correlation, signiﬁcantly improving over baselines and previous sentence speciﬁcity prediction systems.
Researcher Affiliation	Collaboration	Yifan Gao, Yang Zhong, Daniel Preot iuc-Pietro, Junyi Jessy Li Department of Mathematics, Department of Linguistics, The University of Texas at Austin {yifan233@,yang.zhong@,jessy@austin.}utexas.edu Bloomberg LP EMAIL
Pseudocode	No	The paper describes the model and features but does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our dataset and system are openly available online.2 https://github.com/cs329yangzhong/speciﬁcity Twitter
Open Datasets	Yes	To this end, we ﬁrst introduce a large dataset of 7,267 tweets annotated with text speciﬁcity on a ﬁne-grained scale of 1-5. ... Our dataset and system are openly available online.2 https://github.com/cs329yangzhong/speciﬁcity Twitter
Dataset Splits	Yes	We use the dataset described in Section 3 for training (5,767 examples), validation/development (500 examples) and testing (1,000 examples).
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software like 'Scikit-Learn', 'Stanford Name Entity Recognizer', and 'Stanford POS Tagger' but does not specify their version numbers.
Experiment Setup	Yes	Speciﬁcally, we use Support Vector Regression (SVR) with Radial Basis Function (RBF) kernel3 for sentence speciﬁcity prediction. ... The dimension for word embeddings and the number of Brown clusters are 100, tuned on validation set.