reproducibilityindex.ai

Policy Caches with Successor Features

Authors: Mark Nemecek, Ronald Parr

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this new method in three environments. Two of them, Gridworld and Reacher, involving the agent interacting with objects. The former is a discrete gridworld which involves collecting objects before reaching a goal, and the latter is a robotic arm simulation which requires touching targets scattered within reach of the arm. The remaining environment, Terrainworld, is based around a grid where each cell contains terrain which regulates the reward for moving to that cell. In our experiments, the reward features are provided to the agent. We train the agent on each task independently in order to avoid the confounding effect of training using an ensemble approach such as GPI.
Researcher Affiliation	Academia	1Department of Computer Science, Duke University, Durham, North Carolina, USA. Correspondence to: Mark Nemecek <markn@cs.duke.edu>.
Pseudocode	Yes	Algorithm 1 Learn New Policy? ... Algorithm 2 Calc Upper Bound
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for their methodology, nor does it include a link to a code repository.
Open Datasets	Yes	In our ﬁrst environment, inspired by Barreto et al. (2017)... Our last environment, shown in Figure 1(c), is a modiﬁcation of Reacher-v2 from Open AI Gym (Brockman et al., 2016)
Dataset Splits	No	The paper does not explicitly provide details about training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) for the data used in their experiments. It describes how tasks were generated but not data splits for model training or validation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using "Open AI Gym" and a modified version of "SFQL", but does not specify version numbers for these or any other software dependencies like programming languages or libraries.
Experiment Setup	Yes	More detail on hyperparameters, network structures, and training procedures are in the appendix. ... Appendix A.3: The network was trained using the Adam optimizer with a learning rate of 0.001, a batch size of 256, and trained for 100,000 steps.