Stochastic Expectation Maximization with Variance Reduction
Authors: Jianfei Chen, Jun Zhu, Yee Whye Teh, Tong Zhang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare s EM-vr with batch EM, s EM and other algorithms on Gaussian mixture models and probabilistic latent semantic analysis, and s EM-vr converges significantly faster than these baselines. |
| Researcher Affiliation | Collaboration | Dept. of Comp. Sci. & Tech., BNRist Center, State Key Lab for Intell. Tech. & Sys., Institute for AI, THBI Lab, Tsinghua University, Beijing, 100084, China Department of Statistics, University of Oxford Tencent AI Lab |
| Pseudocode | Yes | We have pseudocode for s EM and s EM-vr in Appendix D. |
| Open Source Code | No | The paper does not provide concrete access to source code, such as a specific repository link or an explicit statement about code release for the methodology described. |
| Open Datasets | Yes | We compare s EM-vr with b EM and s EM (SCVB0), which is the start-of-the-art algorithm for p LSA, on four datasets listed in Table 1. ... NIPS [1] ... NYTimes [1] ... Wiki [38] ... Pub Med [1]. [1] Arthur Asuncion and David Newman. Uci machine learning repository, 2007. |
| Dataset Splits | No | The paper mentions assessing convergence on the training objective and holding out a testing set, but it does not provide specific details about training, validation, and test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | The testing machine has two 12-core Xeon E5-2692v2 CPUs and 64GB main memory. |
| Software Dependencies | No | The paper states that 'All the algorithms are implemented in C++', but does not provide specific version numbers for the C++ compiler or any other software libraries or dependencies used. |
| Experiment Setup | Yes | For each dataset and the number of topics K {50, 100}, we first select the hyperparameters by a grid search Kα {0.1, 1, 10, 100} and β {0.01, 0.1, 1}.3 Then, we do another grid search to choose the step size. For s EM-vr, we choose ρ {0.01, 0.02, 0.05, 0.1, 0.2}, and for all other stochastic algorithms, we set ρt = a/(t + t0)κ, and choose a {10 7, . . . , 100}, t0 {10, 100, 1000} and κ {0.5, 0.75, 1}.4 Finally, we repeat 5 runs with difference random seeds for each algorithm with its best step size. E is 20 for NIPS and NYTimes, and 5 for Wiki and Pub Med. M is 50 for NIPS and 500 for all the other datasets. |