Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fixed-Length Poisson MRF: Adding Dependencies to the Multinomial
Authors: David I. Inouye, Pradeep K. Ravikumar, Inderjit S. Dhillon
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the effectiveness of our LPMRF distribution over Multinomial models by evaluating the test set perplexity on a dataset of abstracts and Wikipedia. Qualitatively, we show that the positive dependencies discovered by LPMRF are interesting and intuitive. |
| Researcher Affiliation | Academia | David I. Inouye Pradeep Ravikumar Inderjit S. Dhillon Department of Computer Science University of Texas at Austin EMAIL |
| Pseudocode | No | The paper describes algorithms in text but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Finally, we show that our algorithms are fast and have good scaling (code available online). |
| Open Datasets | Yes | We evaluated our novel LPMRF model using perplexity on a held-out test set of documents from a corpus composed of research paper abstracts3 denoted Classic3 and a collection of Wikipedia documents. (Footnote 3: http://ir.dcs.gla.ac.uk/resources/test_collections/) |
| Dataset Splits | No | We train all the models using a 90% training split of the documents and compute the held-out perplexity on the remaining 10% where perplexity is equal to exp( L(Xtest|θ1...k, Φ1...k)/Ntest), where L is the log likelihood and Ntest is the total number of words in the test set. The paper specifies a 90/10 train/test split but does not explicitly mention a separate validation split. |
| Hardware Specification | Yes | All timing experiments were conducted on the TACC Maverick system with Intel Xeon E5-2680 v2 Ivy Bridge CPUs (2.80 GHz), 20 CPUs per node, and 12.8 GB memory per CPU (https://www.tacc.utexas.edu/). |
| Software Dependencies | No | In C++, we implemented the three core algorithms... trivially parallelized using Open MP (http://openmp.org/). For LDA, we used... a MATLAB. The paper mentions software but does not specify version numbers for any of them. |
| Experiment Setup | Yes | For a single Multinomial or LPMRF, we set the smoothing parameter β to 10 4. We select the LPMRF models using all combinations of 20 log spaced λ between 1 and 10 3, and 5 linearly spaced weighting function constants c between 1 and 2... For LDA, we used 2000 iterations and optimized the hyperparameters α and β using the likelihood of a tuning set. |