Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data

Authors: Manzil Zaheer, Amr Ahmed, Alexander J. Smola

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluations show that the proposed model compares favorably with several state-of-the-art baselines. We present an efficient Stochastic EM inference algorithm for our model that scales to millions of users/documents. Our experimental evaluations show that the proposed model compares favorably with several state-of-the-art baselines.
Researcher Affiliation Collaboration Manzil Zaheer 1 Amr Ahmed 2 Alexander J Smola 1 1Carnegie Mellon University, Pittsburgh PA work done while at Google 2Google Inc, Mountain View CA. Correspondence to: Manzil Zaheer <manzil@cmu.edu>.
Pseudocode Yes Algorithm 1 Stochastic EM for LLA
Open Source Code No The paper mentions "1Available at http://manzil.ml/lla.html" in a footnote. However, this link does not provide concrete access to the source code for the methodology described in the paper.
Open Datasets Yes For reproducibility we focus on the task of language modeling over the publicly available Wikipedia dataset, and for generality, we show additional experiments on the less-structured domain of user modeling.
Dataset Splits No For all experiments we follow the standard setup for evaluating temporal models, i.e. divide each document (user history) into 60% for training and 40% for testing. All hyper-parameters of the models were tuned over a development set. The paper mentions a "development set" (often used for validation) and a 60% training split, but it does not specify the percentage or size of the validation split (development set) explicitly.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions "Automatic differentiation software packages such as Tensor Flow" and that models were trained "using stochastic gradient decent with Adam", but it does not provide specific version numbers for these software components.
Experiment Setup Yes Unless otherwise stated, we used 1000 topics for LLA and LDA variants. For LSTM and LLA variants, we selected the dimensions of the input embedding (word or topic) and evolving latent state (over words or topics) in the range of {50, 150, 250}. In case of character-based models, we tuned the dimensions of the character embedding and latent state (over characters) in the range of {50, 100, 150}. We trained all deep models using stochastic gradient decent with Adam (Kingma & Ba, 2014).