Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On collapsed representation of hierarchical Completely Random Measures
Authors: Gaurav Pandey, Ambedkar Dukkipati
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experimental results We use hierarchical CRM-Poisson models for learning topics from the NIPS corpus 1. and The perplexity for the hierarchical CRM-Poisson models as a function of training percentage is plotted in Figure 1. |
| Researcher Affiliation | Academia | Gaurav Pandey EMAIL Ambedkar Dukkipati EMAIL Department of Computer Science and Automation Indian Institute of Science, Bangalore-560012, India |
| Pseudocode | No | The paper describes the steps for Gibbs sampling in a numbered list, but it does not present structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that the code is being released. |
| Open Datasets | Yes | We use hierarchical CRM-Poisson models for learning topics from the NIPS corpus 1. 1The dataset can be downloaded from http: //psiexp.ss.uci.edu/research/programs_data/ toolbox.htm |
| Dataset Splits | No | The paper states: 'For evaluating the different models, we divide each document into a training section and a test section by independently sampling a boolean random variable for each word. The probability of sending the word to the training section is varied from 0.3 to 0.7.' It does not explicitly mention a validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | We run 2000 iterations of Gibbs sampling. The first 500 iterations are discarded, and every sample in every 5 iterations afterwards is used to update the document-specific distribution on topics and the topic specific distribution on words. and For the case of GGP, the value of the discount parameter d is chosen from the set {0, .1, .2, .3, .4}. Furthermore, a gamma prior with rate parameter 2 and shape parameter 4 is defined on θ. and For the case of SGGP, we consider m = 5, and d1 = 0, d2 = .1 . . . , d5 = .4. Furthermore, independent gamma priors with rate parameter 2 and shape parameter 4 are defined for each θq, 1 q 5. The posterior of each parameter θq is sampled via uniform sampling. |