Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Authors: Charles Packer, Pieter Abbeel, Joseph E. Gonzalez

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments that previously required dense reward during meta-training to solve. Our approach solves these environments using the true sparse reward function, with performance comparable to training with a proxy dense reward function.
Researcher Affiliation Academia Charles Packer UC Berkeley Pieter Abbeel UC Berkeley Joseph E. Gonzalez UC Berkeley
Pseudocode Yes Algorithm 1: Hindsight Task Relabelling for Off-Policy Meta-Reinforcement Learning
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes We evaluate our method on a suite of sparse reward environments based those proposed by Gupta et al. (2018b) and Rakelly et al. (2019) (see Figure 2).
Dataset Splits Yes In each environment, a set of 100 tasks is sampled for meta-training, and a set of 100 tasks is sampled from the same task distribution for meta-testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependency details (e.g., library names with version numbers) needed to replicate the experiment.
Experiment Setup Yes In our experiments we found a relatively low relabeling probability (e.g., K = 0.1 and 0.3) was often most effective for HTR (see Figure 8). and Refer to the supplement for further details on the experimental setup.