Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data
Authors: Manzil Zaheer, Amr Ahmed, Alexander J. Smola
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluations show that the proposed model compares favorably with several state-of-the-art baselines. We present an efficient Stochastic EM inference algorithm for our model that scales to millions of users/documents. Our experimental evaluations show that the proposed model compares favorably with several state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | Manzil Zaheer 1 Amr Ahmed 2 Alexander J Smola 1 1Carnegie Mellon University, Pittsburgh PA work done while at Google 2Google Inc, Mountain View CA. Correspondence to: Manzil Zaheer <manzil@cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 Stochastic EM for LLA |
| Open Source Code | No | The paper mentions "1Available at http://manzil.ml/lla.html" in a footnote. However, this link does not provide concrete access to the source code for the methodology described in the paper. |
| Open Datasets | Yes | For reproducibility we focus on the task of language modeling over the publicly available Wikipedia dataset, and for generality, we show additional experiments on the less-structured domain of user modeling. |
| Dataset Splits | No | For all experiments we follow the standard setup for evaluating temporal models, i.e. divide each document (user history) into 60% for training and 40% for testing. All hyper-parameters of the models were tuned over a development set. The paper mentions a "development set" (often used for validation) and a 60% training split, but it does not specify the percentage or size of the validation split (development set) explicitly. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Automatic differentiation software packages such as Tensor Flow" and that models were trained "using stochastic gradient decent with Adam", but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Unless otherwise stated, we used 1000 topics for LLA and LDA variants. For LSTM and LLA variants, we selected the dimensions of the input embedding (word or topic) and evolving latent state (over words or topics) in the range of {50, 150, 250}. In case of character-based models, we tuned the dimensions of the character embedding and latent state (over characters) in the range of {50, 100, 150}. We trained all deep models using stochastic gradient decent with Adam (Kingma & Ba, 2014). |