Large-Scale Stochastic Sampling from the Probability Simplex

Authors: Jack Baker, Paul Fearnhead, Emily Fox, Christopher Nemeth

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply SCIR and SGRLD to LDA on a dataset of scraped Wikipedia documents... We apply this model to the anonymous Microsoft user dataset... Performance is evaluated by measuring the predictive performance of the trained model on a held out test set...
Researcher Affiliation Academia Jack Baker STOR-i CDT, Mathematics and Statistics Lancaster University j.baker1@lancaster.ac.uk Paul Fearnhead Mathematics and Statistics Lancaster University p.fearnhead@lancaster.ac.uk Emily B. Fox Computer Science & Engineering and Statistics University of Washington ebfox@uw.edu Christopher Nemeth Mathematics and Statistics Lancaster University c.nemeth@lancaster.ac.uk
Pseudocode Yes Algorithm 1: Stochastic Cox-Ingersoll-Ross (SCIR) for sampling from the probability simplex.
Open Source Code Yes 1Code available at https://github.com/jbaker92/scir.
Open Datasets Yes We apply SCIR and SGRLD to LDA on a dataset of scraped Wikipedia documents, by adapting the code released by Patterson and Teh (2013)... We apply this model to the anonymous Microsoft user dataset (Breese et al., 1998).
Dataset Splits No No specific details on train/validation/test dataset splits (exact percentages, sample counts, or detailed splitting methodology) were provided, beyond mentioning a held-out test set.
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory, or detailed computer specifications) used for running experiments were mentioned.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that are needed to replicate the experiments.
Experiment Setup Yes At each iteration a minibatch of 50 documents is sampled in an online manner.