Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Data Science for Social Good — 2014 KDD Highlights

Authors: Wei Wang

AAAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The KDD conference typically has an emphasis on research motivated by real-world applications. The breadth of topics covered in the 2014 research program is truly comprehensive and nicely balanced among social and information networks, data mining for social good, graph mining, statistical techniques for big data, topic modeling, recommender systems, data streams, scalable methods, Web mining, clustering, feature selection, applications to health care and medicine, public safety, advertising, social analytics, personalization, workforce analytics, health, and many more. [...] Li and co-authors (Li 2014) investigated this issue in the context of topic modeling. Sampling is employed in topic modeling inference in order to associate latent variables with observations. Leveraging the sparsity property, they proposed an efficient algorithm that approximates a dense, slowly changing distribution buy the combination of Metropolis-Hastings step, use of sparsity, and amortized constant time sampling via Walker's alias method. It scales linearly to the number of instantiates topics in the document rather than the total number of topics, leading to an order of magnitude speedup. This algorithm is generic, and has wide applications in statistical modeling. This paper was recognized with the Best Research Paper Award.
Researcher Affiliation	Academia	Department of Computer Science, University of California, Los Angeles EMAIL
Pseudocode	No	No pseudocode or algorithm blocks are present in this paper.
Open Source Code	No	No statement or link indicating that open-source code for the content of this paper is available.
Open Datasets	No	The paper mentions 'Electronic health records (EHRs)' and other data types that were used by the summarized research papers, but provides no concrete access information (link, DOI, specific repository, or formal citation with author/year for public dataset) for any dataset.
Dataset Splits	No	No specific dataset split information (percentages, counts, or citations to predefined splits) is provided for any of the summarized experiments.
Hardware Specification	No	No specific hardware details (GPU/CPU models, processor types, or memory amounts) are mentioned for any of the summarized experiments.
Software Dependencies	No	No specific ancillary software details (library or solver names with version numbers) are mentioned for any of the summarized experiments.
Experiment Setup	No	No specific experimental setup details (hyperparameter values, training configurations, or system-level settings) are provided for any of the summarized experiments.