Hierarchical Dirichlet Scaling Process

Authors: Dongwoo Kim, Alice Oh

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on synthetic datasets as well as datasets of newswire, medical journal articles, and Wikipedia, we show that the HDSP results in better predictive performance than HDP, labeled LDA and partially labeled LDA.
Researcher Affiliation Academia Dongwoo Kim DW.KIM@KAIST.AC.KR KAIST, Daejeon, Korea Alice Oh ALICE.OH@KAIST.EDU KAIST, Daejeon, Korea
Pseudocode No The paper describes the variational inference steps in text but does not include a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of its methodology.
Open Datasets Yes We use two multi-labeled corpora, RCV 2, newswire from Reuter’s, and OHSUMED 3, a subset of the Medline journal articles, and one partially labeled corpus, Wikipedia. Footnote 2: http://trec.nist.gov/data/reuters/reuters.html Footnote 3: http://ir.ohsu.edu/ohsumed/ohsumed.html
Dataset Splits Yes To measure the predictive performance, we leave 20% of the documents for testing and use the remaining 80% to train the models.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For all experiments, we set the truncation level T to 200. We terminate variational inference when the fractional change of the lower bound falls below 10 3, and we optimize all hyper parameters during inference except η. For the L-LDA and PLDA, we implement the collapsed Gibbs sampling algorithm. For each model, we run 5,000 iterations, the first 3,000 as burn-in and then using the samples thereafter with gaps of 100 iterations. For PLDA, we set the number of topics for each label to two and five (PLDA-2, PLDA-5). We try five different values for the topic Dirichlet parameter η: η = 0.1, 0.25, 0.5, 0.75, 1.0. Finally all results are averaged over 20 runs with different random initialization.