reproducibilityindex.ai

Large-Scale Stochastic Sampling from the Probability Simplex

Authors: Jack Baker, Paul Fearnhead, Emily Fox, Christopher Nemeth

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply SCIR and SGRLD to LDA on a dataset of scraped Wikipedia documents... We apply this model to the anonymous Microsoft user dataset... Performance is evaluated by measuring the predictive performance of the trained model on a held out test set...
Researcher Affiliation	Academia	Jack Baker STOR-i CDT, Mathematics and Statistics Lancaster University j.baker1@lancaster.ac.uk Paul Fearnhead Mathematics and Statistics Lancaster University p.fearnhead@lancaster.ac.uk Emily B. Fox Computer Science & Engineering and Statistics University of Washington ebfox@uw.edu Christopher Nemeth Mathematics and Statistics Lancaster University c.nemeth@lancaster.ac.uk
Pseudocode	Yes	Algorithm 1: Stochastic Cox-Ingersoll-Ross (SCIR) for sampling from the probability simplex.
Open Source Code	Yes	1Code available at https://github.com/jbaker92/scir.
Open Datasets	Yes	We apply SCIR and SGRLD to LDA on a dataset of scraped Wikipedia documents, by adapting the code released by Patterson and Teh (2013)... We apply this model to the anonymous Microsoft user dataset (Breese et al., 1998).
Dataset Splits	No	No specific details on train/validation/test dataset splits (exact percentages, sample counts, or detailed splitting methodology) were provided, beyond mentioning a held-out test set.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that are needed to replicate the experiments.
Experiment Setup	Yes	At each iteration a minibatch of 50 documents is sampled in an online manner.