Policy Caches with Successor Features

Authors: Mark Nemecek, Ronald Parr

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate this new method in three environments. Two of them, Gridworld and Reacher, involving the agent interacting with objects. The former is a discrete gridworld which involves collecting objects before reaching a goal, and the latter is a robotic arm simulation which requires touching targets scattered within reach of the arm. The remaining environment, Terrainworld, is based around a grid where each cell contains terrain which regulates the reward for moving to that cell. In our experiments, the reward features are provided to the agent. We train the agent on each task independently in order to avoid the confounding effect of training using an ensemble approach such as GPI.
Researcher Affiliation Academia 1Department of Computer Science, Duke University, Durham, North Carolina, USA. Correspondence to: Mark Nemecek <markn@cs.duke.edu>.
Pseudocode Yes Algorithm 1 Learn New Policy? ... Algorithm 2 Calc Upper Bound
Open Source Code No The paper does not provide an explicit statement about releasing source code for their methodology, nor does it include a link to a code repository.
Open Datasets Yes In our first environment, inspired by Barreto et al. (2017)... Our last environment, shown in Figure 1(c), is a modification of Reacher-v2 from Open AI Gym (Brockman et al., 2016)
Dataset Splits No The paper does not explicitly provide details about training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) for the data used in their experiments. It describes how tasks were generated but not data splits for model training or validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using "Open AI Gym" and a modified version of "SFQL", but does not specify version numbers for these or any other software dependencies like programming languages or libraries.
Experiment Setup Yes More detail on hyperparameters, network structures, and training procedures are in the appendix. ... Appendix A.3: The network was trained using the Adam optimizer with a learning rate of 0.001, a batch size of 256, and trained for 100,000 steps.