Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Authors: Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method improves the sample efficiency of both onand off-policy RL algorithms across several environments and tasks. ... Empirically, we show with experiments on eight RL environments that the proposed approach is more sample efficient.
Researcher Affiliation Collaboration 1Mila, University of Montreal, 2 Google Deepmind, 3 Google Brain, 4 IIT Kanpur, 5 University of California, Berkeley.
Pseudocode Yes Algorithm 1 Improve Policy via Recall Traces and Backtracking Model; Algorithm 2 Produce High Value States
Open Source Code No The paper does not contain an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes To investigate this, we use the four-room environment from Schaul et al. (2015)... We performed experiments on the U-Maze Ant task described in Held et al. (2017). ... We conducted robotic locomotion experiments using the Mu Jo Co simulator (Todorov et al., 2012). ... We compare to Soft Actor Critic (SAC) (Haarnoja et al., 2018), shown to be more sample efficient compared to other off-policy algorithms such as DDPG (Lillicrap et al., 2015).
Dataset Splits No The paper does not explicitly describe dataset splits (e.g., percentages, counts, or specific predefined splits) for training, validation, or testing.
Hardware Specification No The paper does not explicitly mention any specific hardware details such as GPU models, CPU models, or cloud computing instance types used for running the experiments.
Software Dependencies No The paper mentions software like the "Mu Jo Co simulator", "rllab implementation of trust region policy optimization (TRPO)", and "Soft Actor Critic (SAC)", but it does not provide specific version numbers for these software components.
Experiment Setup Yes The backtracking model we used for all the experiments consisted of two multi-layer perceptrons: one for the backward action predictor Q(at|st+1) and one for the backward state predictor Q(st|at, st+1). Both MLPs had two hidden layers of 128 units. ... We do about a hundred training-steps of the backtracking model for every 5 training-steps of the RL algorithm. ... Table 3: Hyperparameters for the PER Implementation (Environment Size, Batch-size, Num. of Actor Critic steps per PER step, PER α, PER β).