WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling
Authors: Hao Zhang, Bo Chen, Dandan Guo, Mingyuan Zhou
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora. |
| Researcher Affiliation | Academia | Hao Zhang, Bo Chen & Dandan Guo National Laboraory of Radar Signal Processing, Collaborative Innovation Center of Information Sensing and Understanding, Xidian University, Xi an, China. zhanghao_xidian@163.com bchen@mail.xidian.edu.cn gdd_xidian@126.com. Mingyuan Zhou Mc Combs School of Business, The University of Texas at Austin, Austin, TX 78712, USA. Mingyuan.Zhou@mccombs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Hybrid stochastic-gradient MCMC and autoencoding variational inference for WHAI |
| Open Source Code | No | The paper states 'Our code is written in Theano (Theano Development Team, 2016).' but does not provide a specific link or explicit statement about releasing the source code for WHAI. |
| Open Datasets | Yes | We compare the performance of different algorithms on 20Newsgroups (20News), Reuters Corpus Volume I (RCV1), and Wikipedia (Wiki)... Wiki, with a vocabulary size of 7,702, consists of 10 million documents randomly downloaded from Wikipedia using the script provided for Hoffman et al. (2010). |
| Dataset Splits | No | for each corpus, we randomly select 70% of the word tokens from each document to form a training matrix T, holding out the remaining 30% to form a testing matrix Y. The paper specifies training and testing splits but does not mention a validation split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper states 'Our code is written in Theano (Theano Development Team, 2016).', which refers to the framework but does not provide a specific version number for Theano or other software dependencies. |
| Experiment Setup | Yes | For the proposed model, we set the mini-batch size as 200, and use as burn-in 2000 mini-batches for both 20News and RCV1 and 3500 for wiki. We collect 3000 samples after burn-in to calculate perplexity. The hyperparameters of WHAI are set as: η(l) = 1/Kl, r = 1, and c(l) n = 1. |