Policy Caches with Successor Features
Authors: Mark Nemecek, Ronald Parr
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this new method in three environments. Two of them, Gridworld and Reacher, involving the agent interacting with objects. The former is a discrete gridworld which involves collecting objects before reaching a goal, and the latter is a robotic arm simulation which requires touching targets scattered within reach of the arm. The remaining environment, Terrainworld, is based around a grid where each cell contains terrain which regulates the reward for moving to that cell. In our experiments, the reward features are provided to the agent. We train the agent on each task independently in order to avoid the confounding effect of training using an ensemble approach such as GPI. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Duke University, Durham, North Carolina, USA. Correspondence to: Mark Nemecek <markn@cs.duke.edu>. |
| Pseudocode | Yes | Algorithm 1 Learn New Policy? ... Algorithm 2 Calc Upper Bound |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for their methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | In our first environment, inspired by Barreto et al. (2017)... Our last environment, shown in Figure 1(c), is a modification of Reacher-v2 from Open AI Gym (Brockman et al., 2016) |
| Dataset Splits | No | The paper does not explicitly provide details about training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) for the data used in their experiments. It describes how tasks were generated but not data splits for model training or validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Open AI Gym" and a modified version of "SFQL", but does not specify version numbers for these or any other software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | More detail on hyperparameters, network structures, and training procedures are in the appendix. ... Appendix A.3: The network was trained using the Adam optimizer with a learning rate of 0.001, a batch size of 256, and trained for 100,000 steps. |