Large-Scale Stochastic Sampling from the Probability Simplex
Authors: Jack Baker, Paul Fearnhead, Emily Fox, Christopher Nemeth
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply SCIR and SGRLD to LDA on a dataset of scraped Wikipedia documents... We apply this model to the anonymous Microsoft user dataset... Performance is evaluated by measuring the predictive performance of the trained model on a held out test set... |
| Researcher Affiliation | Academia | Jack Baker STOR-i CDT, Mathematics and Statistics Lancaster University j.baker1@lancaster.ac.uk Paul Fearnhead Mathematics and Statistics Lancaster University p.fearnhead@lancaster.ac.uk Emily B. Fox Computer Science & Engineering and Statistics University of Washington ebfox@uw.edu Christopher Nemeth Mathematics and Statistics Lancaster University c.nemeth@lancaster.ac.uk |
| Pseudocode | Yes | Algorithm 1: Stochastic Cox-Ingersoll-Ross (SCIR) for sampling from the probability simplex. |
| Open Source Code | Yes | 1Code available at https://github.com/jbaker92/scir. |
| Open Datasets | Yes | We apply SCIR and SGRLD to LDA on a dataset of scraped Wikipedia documents, by adapting the code released by Patterson and Teh (2013)... We apply this model to the anonymous Microsoft user dataset (Breese et al., 1998). |
| Dataset Splits | No | No specific details on train/validation/test dataset splits (exact percentages, sample counts, or detailed splitting methodology) were provided, beyond mentioning a held-out test set. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory, or detailed computer specifications) used for running experiments were mentioned. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) that are needed to replicate the experiments. |
| Experiment Setup | Yes | At each iteration a minibatch of 50 documents is sampled in an online manner. |