reproducibilityindex.ai

Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data

Authors: Manzil Zaheer, Amr Ahmed, Alexander J. Smola

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental evaluations show that the proposed model compares favorably with several state-of-the-art baselines. We present an efﬁcient Stochastic EM inference algorithm for our model that scales to millions of users/documents. Our experimental evaluations show that the proposed model compares favorably with several state-of-the-art baselines.
Researcher Affiliation	Collaboration	Manzil Zaheer 1 Amr Ahmed 2 Alexander J Smola 1 1Carnegie Mellon University, Pittsburgh PA work done while at Google 2Google Inc, Mountain View CA. Correspondence to: Manzil Zaheer <manzil@cmu.edu>.
Pseudocode	Yes	Algorithm 1 Stochastic EM for LLA
Open Source Code	No	The paper mentions "1Available at http://manzil.ml/lla.html" in a footnote. However, this link does not provide concrete access to the source code for the methodology described in the paper.
Open Datasets	Yes	For reproducibility we focus on the task of language modeling over the publicly available Wikipedia dataset, and for generality, we show additional experiments on the less-structured domain of user modeling.
Dataset Splits	No	For all experiments we follow the standard setup for evaluating temporal models, i.e. divide each document (user history) into 60% for training and 40% for testing. All hyper-parameters of the models were tuned over a development set. The paper mentions a "development set" (often used for validation) and a 60% training split, but it does not specify the percentage or size of the validation split (development set) explicitly.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions "Automatic differentiation software packages such as Tensor Flow" and that models were trained "using stochastic gradient decent with Adam", but it does not provide specific version numbers for these software components.
Experiment Setup	Yes	Unless otherwise stated, we used 1000 topics for LLA and LDA variants. For LSTM and LLA variants, we selected the dimensions of the input embedding (word or topic) and evolving latent state (over words or topics) in the range of {50, 150, 250}. In case of character-based models, we tuned the dimensions of the character embedding and latent state (over characters) in the range of {50, 100, 150}. We trained all deep models using stochastic gradient decent with Adam (Kingma & Ba, 2014).