reproducibilityindex.ai

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Authors: Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our method improves the sample efﬁciency of both onand off-policy RL algorithms across several environments and tasks. ... Empirically, we show with experiments on eight RL environments that the proposed approach is more sample efﬁcient.
Researcher Affiliation	Collaboration	1Mila, University of Montreal, 2 Google Deepmind, 3 Google Brain, 4 IIT Kanpur, 5 University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 Improve Policy via Recall Traces and Backtracking Model; Algorithm 2 Produce High Value States
Open Source Code	No	The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	To investigate this, we use the four-room environment from Schaul et al. (2015)... We performed experiments on the U-Maze Ant task described in Held et al. (2017). ... We conducted robotic locomotion experiments using the Mu Jo Co simulator (Todorov et al., 2012). ... We compare to Soft Actor Critic (SAC) (Haarnoja et al., 2018), shown to be more sample efﬁcient compared to other off-policy algorithms such as DDPG (Lillicrap et al., 2015).
Dataset Splits	No	The paper does not explicitly describe dataset splits (e.g., percentages, counts, or specific predefined splits) for training, validation, or testing.
Hardware Specification	No	The paper does not explicitly mention any specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running the experiments.
Software Dependencies	No	The paper mentions software like the "Mu Jo Co simulator", "rllab implementation of trust region policy optimization (TRPO)", and "Soft Actor Critic (SAC)", but it does not provide specific version numbers for these software components.
Experiment Setup	Yes	The backtracking model we used for all the experiments consisted of two multi-layer perceptrons: one for the backward action predictor Q(at\|st+1) and one for the backward state predictor Q(st\|at, st+1). Both MLPs had two hidden layers of 128 units. ... We do about a hundred training-steps of the backtracking model for every 5 training-steps of the RL algorithm. ... Table 3: Hyperparameters for the PER Implementation (Environment Size, Batch-size, Num. of Actor Critic steps per PER step, PER α, PER β).