reproducibilityindex.ai

Autoencoding Variational Inference For Topic Models

Authors: Akash Srivastava, Charles Sutton

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run experiments on both the 20 Newsgroups (11,000 training instances with 2000 word vocabulary) and RCV1 Volume 2 ( 800K training instances with 10000 word vocabulary) datasets. Our preprocessing involves tokenization, removal of some non UTF-8 characters for 20 Newsgroups and English stop word removal. We ﬁrst compare our AVITM inference method with the standard online mean-ﬁeld variational inference (Hoffman et al., 2010) and collapsed Gibbs sampling (Grifﬁths & Steyvers, 2004) on the LDA model.
Researcher Affiliation	Academia	Akash Srivastava Informatics Forum, University of Edinburgh 10, Crichton St Edinburgh, EH89AB, UK akash.srivastava@ed.ac.uk Charles Sutton Informatics Forum, University of Edinburgh 10, Crichton St Edinburgh, EH89AB, UK csutton@inf.ed.ac.uk Additional afﬁliation: Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB
Pseudocode	Yes	Algorithm 1: LDA as a generative model. for each document w do Draw topic distribution θ Dirichlet(α); for each word at position n do Sample topic zn Multinomial(1, θ); Sample word wn Multinomial(1, βzn); end end
Open Source Code	Yes	Code available at https://github.com/akashgit/autoencoding_vi_for_topic_models
Open Datasets	Yes	We run experiments on both the 20 Newsgroups (11,000 training instances with 2000 word vocabulary) and RCV1 Volume 2 ( 800K training instances with 10000 word vocabulary) datasets.
Dataset Splits	No	The paper states training instances but does not provide explicit validation dataset splits (e.g., percentages or counts for a validation set).
Hardware Specification	No	The paper mentions training on 'a single GPU' but does not specify the model or any other specific hardware components like CPU, memory, or cloud instance types.
Software Dependencies	No	The paper mentions using 'scikit-learn for DMFVI and mallet (Mc Callum, 2002) for collapsed Gibbs' and the 'ADAM optimizer (Kingma & Ba, 2015)', but does not provide specific version numbers for these software components.
Experiment Setup	Yes	Speciﬁcally, we train the network with the ADAM optimizer (Kingma & Ba, 2015) using high moment weight (β1) and learning rate (η). Through training at higher rates, early peaks in the functional space can be easily avoided. The problem is that momentum based training coupled with higher learning rate causes the optimizer to diverge. While explicit gradient clipping helps to a certain extent, we found that batch normalization (Ioffe & Szegedy, 2015) does even better by smoothing out the functional space and hence curbing sudden divergence. Finally, we also found an increase in performance with dropout units when applied to θ to force the network to use more of its capacity. (For both parameters, the precise value was chosen by Bayesian optimization. We found that these values in the with BN cases were close to the default settings in the Adam optimizer.)