reproducibilityindex.ai

Graph-Sparse LDA: A Topic Model with Structured Sparsity

Authors: Finale Doshi-Velez, Byron Wallace, Ryan Adams

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our Graph-Sparse LDA model ﬁnds interpretable, predictive topics on one toy example and two real-world examples from biomedical domains. In each case we compare our model with the state-of-the-art Bayesian nonparametric topic modeling approach LIDA (Archambeau, Lakshminarayanan, and Bouchard 2011). Figures 3a and 3b show the difference in the held-out test likelihoods for the ﬁnal 50 samples over 20 independent instantiations of the toy problem.
Researcher Affiliation	Academia	Finale Doshi-Velez Harvard University Cambridge, MA 02138 ﬁnale@seas.harvard.edu Byron C Wallace University of Texas at Austin Austin, TX 78701 byron.wallace@utexas.edu Ryan Adams Harvard University Cambridge, MA 02138 rpa@seas.harvard.edu
Pseudocode	No	In the supplementary materials, we derive a blocked-Gibbs sampler for B, B, A, A, and P (as well as for adding and deleting topics).
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that the code is available.
Open Datasets	Yes	Autism Spectrum Disorder (ASD) is a complex, heterogenous disease that is often accompanied by many co-occurring conditions such as epilepsy and intellectual disability. We consider a set of 3804 patients with 3626 different diagnoses where the datum Xnw corresponds to the number of times patient n received diagnosis w during the ﬁrst 15 years of life.2 Diagnoses are organized in a tree-structured hierarchy known as ICD-9CM (Bodenreider 2004). The National Library of Medicine maintains a controlled structured vocabulary of Medical Subject Headings (Me SH) (Lipscomb 2000).
Dataset Splits	No	A random 1% of each data-set was held out to compute predictive log-likelihoods.
Hardware Specification	No	The paper does not provide specific hardware details (such as exact GPU/CPU models or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions the models and algorithms used (e.g., LDA, LIDA, Gibbs sampler) but does not provide specific version numbers for any software dependencies or libraries required for replication.
Experiment Setup	Yes	We ran all samplers for 250 iterations. To reduce burnin, The product AP was initialized using an LDA tensor decomposition (Anandkumar et al. 2012) and then factored into A and P using alternating minimization to ﬁnd a sparse A that enforced the simplex and ontology constraints.