Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Temporal Topic Analysis with Endogenous and Exogenous Processes

Authors: Baiyang Wang, Diego Klabjan

AAAI 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The model is applied to two collections of documents to illustrate its empirical performance: online job advertisements from Direct Employers Association and journalists postings on Business Insider.com.
Researcher Affiliation	Academia	Department of Industrial Engineering and Management Sciences, Northwestern University, 2145 Sheridan Road, Evanston, Illinois, USA, 60208
Pseudocode	No	The paper describes the model and algorithm using mathematical equations and textual explanations, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets	No	The paper states it uses "online job advertisements from Direct Employers Association" and "journalists postings on Business Insider.com". However, it does not provide concrete access information (e.g., specific links, DOIs, or citations to publicly available versions of these collected datasets).
Dataset Splits	No	The paper specifies training and testing dataset sizes and percentages for both case studies (e.g., 'The training data set consists of 40,449 advertisements, and the testing data set consists of 4,211 advertisements (9.4% of the sample)'). However, it does not explicitly mention a separate validation set or split.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory specifications, or types of computing resources used for running the experiments.
Software Dependencies	No	The paper mentions software like "Java" and "R package stm" but does not provide specific version numbers for these or other software dependencies required to replicate the experiments.
Experiment Setup	Yes	We initialize the hyperparameters of LDA as follows: α = (50/K, . . . , 50/K), β = (0.01, . . . , 0.01)... For GCLDA, we let γ = 1, π0 = (1/K, . . . , 1/K), αt iid Γ(1, 1), p(η) e 0.01 \|ηk\|, λ Γ(1, 1), β = 0.01. We carry out the Metropolis-within-Gibbs algorithm... and run 5,000 iterations of the Markov chain with 1,000 burn-in samples... The number of topics is set to K = 50 for both data sets.