Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Large-Scale Stochastic Sampling from the Probability Simplex
Authors: Jack Baker, Paul Fearnhead, Emily Fox, Christopher Nemeth
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply SCIR and SGRLD to LDA on a dataset of scraped Wikipedia documents... We apply this model to the anonymous Microsoft user dataset... Performance is evaluated by measuring the predictive performance of the trained model on a held out test set... |
| Researcher Affiliation | Academia | Jack Baker STOR-i CDT, Mathematics and Statistics Lancaster University EMAIL Paul Fearnhead Mathematics and Statistics Lancaster University EMAIL Emily B. Fox Computer Science & Engineering and Statistics University of Washington EMAIL Christopher Nemeth Mathematics and Statistics Lancaster University EMAIL |
| Pseudocode | Yes | Algorithm 1: Stochastic Cox-Ingersoll-Ross (SCIR) for sampling from the probability simplex. |
| Open Source Code | Yes | 1Code available at https://github.com/jbaker92/scir. |
| Open Datasets | Yes | We apply SCIR and SGRLD to LDA on a dataset of scraped Wikipedia documents, by adapting the code released by Patterson and Teh (2013)... We apply this model to the anonymous Microsoft user dataset (Breese et al., 1998). |
| Dataset Splits | No | No specific details on train/validation/test dataset splits (exact percentages, sample counts, or detailed splitting methodology) were provided, beyond mentioning a held-out test set. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory, or detailed computer specifications) used for running experiments were mentioned. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that are needed to replicate the experiments. |
| Experiment Setup | Yes | At each iteration a minibatch of 50 documents is sampled in an online manner. |