Spectral Learning of Predictive State Representations with Insufficient Statistics
Authors: Alex Kulesza, Nan Jiang, Satinder Singh
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on both synthetic and real-world problems. Using synthetic HMMs, we show that our method is robust to learning under a variety of transition topologies; compared to a baseline using the shortest tests and histories, our method achieves error rates up to an order of magnitude lower. We also demonstrate significantly improved prediction results on a real-world language modeling task using a large collection of text from Wikipedia. |
| Researcher Affiliation | Academia | Computer Science & Engineering University of Michigan Ann Arbor, MI, USA |
| Pseudocode | Yes | Algorithm 1 Search for sets of k tests and histories that approximately minimize maxo σ1(Bo). Input: dataset D, initial T and H of size k, distributions p T /p H over candidate tests/histories, number of rounds r {Bo} := SPECTLEARN(D, T , H) σopt := maxo O σ1(Bo) for i = 1, . . . , r do Sample h H p H for h H do {Bo} := SPECTLEARN(D, T , H \ h {h}) σ(h ) := maxo O σ1(Bo) h = arg minh σ(h ) if σ(h ) < σopt then σopt := σ(h ) H := H \ h {h} [Repeat the same procedure for T ] Output: T ,H |
| Open Source Code | No | No explicit statement about open-source code for the described methodology or a link to a repository was found. |
| Open Datasets | Yes | Finally, we apply our algorithm to model a real-world text dataset of over 6.5 million sentences from Wikipedia articles (Sutskever, Martens, and Hinton 2011). |
| Dataset Splits | No | The majority of the data is used for training, but we reserve 100,000 sentences for evaluation. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) were provided for the experimental setup. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers or other detailed software dependencies. |
| Experiment Setup | Yes | Our algorithm is initialized at the baseline T and H, and we sample new tests and histories whose length is one observation longer than the longest sequences in the baseline sets; the sampling probability of a sequence x is proportional to p2(x). We run our algorithm for 10 rounds. |