Spectral Learning of Predictive State Representations with Insufficient Statistics

Authors: Alex Kulesza, Nan Jiang, Satinder Singh

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on both synthetic and real-world problems. Using synthetic HMMs, we show that our method is robust to learning under a variety of transition topologies; compared to a baseline using the shortest tests and histories, our method achieves error rates up to an order of magnitude lower. We also demonstrate significantly improved prediction results on a real-world language modeling task using a large collection of text from Wikipedia.
Researcher Affiliation Academia Computer Science & Engineering University of Michigan Ann Arbor, MI, USA
Pseudocode Yes Algorithm 1 Search for sets of k tests and histories that approximately minimize maxo σ1(Bo). Input: dataset D, initial T and H of size k, distributions p T /p H over candidate tests/histories, number of rounds r {Bo} := SPECTLEARN(D, T , H) σopt := maxo O σ1(Bo) for i = 1, . . . , r do Sample h H p H for h H do {Bo} := SPECTLEARN(D, T , H \ h {h}) σ(h ) := maxo O σ1(Bo) h = arg minh σ(h ) if σ(h ) < σopt then σopt := σ(h ) H := H \ h {h} [Repeat the same procedure for T ] Output: T ,H
Open Source Code No No explicit statement about open-source code for the described methodology or a link to a repository was found.
Open Datasets Yes Finally, we apply our algorithm to model a real-world text dataset of over 6.5 million sentences from Wikipedia articles (Sutskever, Martens, and Hinton 2011).
Dataset Splits No The majority of the data is used for training, but we reserve 100,000 sentences for evaluation.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) were provided for the experimental setup.
Software Dependencies No The paper does not provide specific software names with version numbers or other detailed software dependencies.
Experiment Setup Yes Our algorithm is initialized at the baseline T and H, and we sample new tests and histories whose length is one observation longer than the longest sequences in the baseline sets; the sampling probability of a sequence x is proportional to p2(x). We run our algorithm for 10 rounds.