Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Topic Models to Infer Socio-Economic Maps

Authors: Lingzi Hong, Enrique Frias-Martinez, Vanessa Frias-Martinez

AAAI 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the accuracy of the approaches proposed, we use two datasets: a large-scale spatio-temporal dataset containing one month of calling activity for three cities from the same country, and the socio-economic map for those three cities containing regional SEL information. ... Table 1 shows the results for the four approaches using regression to infer SELs as continuous values. The results reported are for 20 topics for PMB-LDA (RF and SVR) and 25 for PMBSEL-s LDA, which turned out to be the number of topics that had the best results in terms of accuracy (R2) as shown in Figure 2. ... On the other hand, Table 2 shows the accuracies and F1 scores for all four approaches when SELs are deﬁned as three discrete classes: A, B and C (from high to low socio-economic level).
Researcher Affiliation	Collaboration	Lingzi Hong College of Information Studies University of Maryland EMAIL Enrique Frias-Martinez Telefonica Research Madrid, Spain EMAIL Vanessa Frias-Martinez College of Information Studies University of Maryland EMAIL
Pseudocode	Yes	Algorithm 1 PMBSEL-s LDA ... Algorithm 2 PMB-LDA ... Algorithm 3 PF
Open Source Code	No	The paper does not include any explicit statement about releasing source code for their methodology, nor does it provide any links to a code repository.
Open Datasets	No	For privacy reasons involving the use of cell phone data at large-scale, we cannot reveal the name of the country. The spatio-temporal dataset contains a total of 134M calls and 1.8M individuals; while the SEL map contains a total of 186 regions distributed across the three cities.
Dataset Splits	Yes	To test the accuracy of this approach, we randomly divide the set of regions in the area under study into a training and a testing set (75% 25%) and repeat it 100 times. We use the training set for PMBSEL-s LDA model estimation and the testing set for the inference of SELs from mobility motifs and topics, and report average accuracy values across all runs. For continuous SEL values, we report R2 and RMSE.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions "Libsvm" and "RFR" (Random Forest Regression) or "SVR" (Support Vector Regression) and "RF" (Random Forest), but it does not specify version numbers for these software components or libraries.
Experiment Setup	Yes	For Support Vector Regression, we used a Gaussian RBF kernel and the parameters (C, γ, ϵ) were selected using 5-fold cross validation to minimize the mean squared error. For Random Forest, the results are reported for 8 random trees in PMB-SEL; 146 trees in PF and 14 trees in PF2.