Learning Multi-Step Predictive State Representations
Authors: Lucas Langer, Borja Balle, Doina Precup
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments on robot exploration tasks in a wide variety of environments and conclude that the use of M-PSRs improves over the classical PSR for varying amounts of data, environment sizes, and number of observations symbols. Through an extensive empirical evaluation, we show that in environments where characteristic multi-step observations occur frequently, M-PSRs improve the quality of learning with respect to classical PSRs. This improvement is uniform over a range of environment sizes, number of observation symbols, and amounts of training data. |
| Researcher Affiliation | Academia | Lucas Langer Mc Gill University Borja Balle Lancaster University United Kingdom Doina Precup Mc Gill University |
| Pseudocode | Yes | Algorithm 1 gives pseudocode for computing this function. Algorithm 2 Base Selection Algorithm INPUT: Train, Sub M OUTPUT: 0 |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper uses custom-built simulated environments ("Double Loop maze" and "Pacman-style environment") and generates observation sequences from them. It does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper discusses using different amounts of training data but does not provide specific details on train/validation/test splits, sample counts for each split, or reference to predefined splits from known datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions). |
| Experiment Setup | Yes | For the Base M-PSR, we set b = 2, K = 8, so that the longest string in 0 is σ256. For the Base M-PSR, we take K = 8 and B = 2 for both symbols = {g, b}, where g stands for green and b for blue. For the Tree M-PSR, we set L = 7 for a total of 128 operators, a far larger limit than for the other M-PSRs. For Double Loop environments, we set N = 150, while for the Pacman domain, N = 600. We fix the length of each trajectory at 3(l1 + l2). |