reproducibilityindex.ai

Hierarchical Dirichlet Scaling Process

Authors: Dongwoo Kim, Alice Oh

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments on synthetic datasets as well as datasets of newswire, medical journal articles, and Wikipedia, we show that the HDSP results in better predictive performance than HDP, labeled LDA and partially labeled LDA.
Researcher Affiliation	Academia	Dongwoo Kim DW.KIM@KAIST.AC.KR KAIST, Daejeon, Korea Alice Oh ALICE.OH@KAIST.EDU KAIST, Daejeon, Korea
Pseudocode	No	The paper describes the variational inference steps in text but does not include a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of its methodology.
Open Datasets	Yes	We use two multi-labeled corpora, RCV 2, newswire from Reuter’s, and OHSUMED 3, a subset of the Medline journal articles, and one partially labeled corpus, Wikipedia. Footnote 2: http://trec.nist.gov/data/reuters/reuters.html Footnote 3: http://ir.ohsu.edu/ohsumed/ohsumed.html
Dataset Splits	Yes	To measure the predictive performance, we leave 20% of the documents for testing and use the remaining 80% to train the models.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For all experiments, we set the truncation level T to 200. We terminate variational inference when the fractional change of the lower bound falls below 10 3, and we optimize all hyper parameters during inference except η. For the L-LDA and PLDA, we implement the collapsed Gibbs sampling algorithm. For each model, we run 5,000 iterations, the ﬁrst 3,000 as burn-in and then using the samples thereafter with gaps of 100 iterations. For PLDA, we set the number of topics for each label to two and ﬁve (PLDA-2, PLDA-5). We try ﬁve different values for the topic Dirichlet parameter η: η = 0.1, 0.25, 0.5, 0.75, 1.0. Finally all results are averaged over 20 runs with different random initialization.