reproducibilityindex.ai

A Linear Dynamical System Model for Text

Authors: David Belanger, Sham Kakade

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we employ our inferred word embeddings as features in standard tagging tasks, obtaining signiﬁcant accuracy improvements.
Researcher Affiliation	Collaboration	David Belanger BELANGER@CS.UMASS.EDU College of Information and Computer Sciences, University of Massachusetts Amherst Sham Kakade SKAKADE@MICROSOFT.COM Microsoft Research
Pseudocode	Yes	Algorithm 1 Learning an LDS for Text
Open Source Code	No	We will release the code of our implementation. SSID requires simple scripting on top of a sparse linear algebra library. Our EM implementation consists of small modiﬁcations to Martens public ASOS code.
Open Datasets	Yes	We ﬁt our LDS using a combination of the APNews, New York Times, and RCV1 newswire corpora, about 1B tokens total. [...] We train the tagging model on the Penn Treebank (PTB) train set, which is not included for LDS training.
Dataset Splits	Yes	The LDS hyperparameters were selected by maximizing the accuracy of a local classiﬁer on the PTB dev set.
Hardware Specification	Yes	Overall, we found that the LDS and Word2Vec took about 12 hours to train on a single-core CPU. [...] The time to train the LDS, about 30 minutes, is inconsequential compared to training the RNN (4 days) on a single CPU core.
Software Dependencies	No	The paper mentions using a "sparse linear algebra library" and "Martens public ASOS code" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	We employ r = 4 for SSID, r = 7 for EM, and h = 200. We add 1000 psuedocounts for each type, by adding 1000 T to each coordinate of µ. [...] Our local classiﬁer was a two-layer neural network with 25 hidden units, which outperformed a linear classiﬁer. The best Word2Vec conﬁguration used the CBOW architecture with a window width of 3. [...] This initializes parameters randomly, with lengthscales tuned as in Mikolov (2012). [...] We tuned the initial value and decay rate.