Successor Feature Sets: Generalizing Successor Representations Across Policies
Authors: Kianté Brantley, Soroush Mehri, Geoff J. Gordon11774-11781
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to explore which of the potential barriers to scaling are most pressing. ... Experiments: Dynamic Programming We tried our dynamic programming method on several small domains: the classic mountain-car domain and a random 18 x 18 gridworld with full and partial observability. We evaluated both planning and feature matching; results for the former are discussed in this section, and an example of the latter is in Fig. 3. |
| Researcher Affiliation | Collaboration | Kiant e Brantley,1 Soroush Mehri, 2 Geoffrey J. Gordon 2 1 University of Maryland College Park 2 Microsoft Research |
| Pseudocode | Yes | Algorithm 1: Feature Matching Policy |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | No | The paper uses simulated environments ('mountain-car domain' and '18x18 gridworld') but does not specify or provide access to a distinct, publicly available dataset in the format required (e.g., a link, DOI, or formal citation for a pre-existing dataset). |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., exact percentages or sample counts), nor does it reference predefined splits with citations. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or their version numbers necessary for replication. |
| Experiment Setup | Yes | In mountain-car, the agent has two actions: accelerate left and accelerate right. The state is (position, velocity), in [-1.2, 0.6] x [-0.07, 0.07]. We discretize to a 12x12 mesh with piecewise-constant approximation. Our one-step features are radial basis functions of the state, with values in [0, 1]. We use 9 RBF centers evenly spaced on a 3x3 grid. ... In the MDP gridworld, the agent has four deterministic actions: up, down, left, and right. The one-step features are (x, y) coordinates scaled to [-1, 1], similar to Fig. 3. In the POMDP gridworld, the actions are stochastic, and the agent only sees a noisy indicator of state. In all domains, the discount is γ = 0.9. ... We evaluate directions mi that we optimized for during backups, as well as new random directions. ... this persistent error is due to our limited-size representation of Φ. The error decreases as we increase the number of boundary points that we store. |