reproducibilityindex.ai

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

Authors: Charles Packer, Pieter Abbeel, Joseph E. Gonzalez

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments that previously required dense reward during meta-training to solve. Our approach solves these environments using the true sparse reward function, with performance comparable to training with a proxy dense reward function.
Researcher Affiliation	Academia	Charles Packer UC Berkeley Pieter Abbeel UC Berkeley Joseph E. Gonzalez UC Berkeley
Pseudocode	Yes	Algorithm 1: Hindsight Task Relabelling for Off-Policy Meta-Reinforcement Learning
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	We evaluate our method on a suite of sparse reward environments based those proposed by Gupta et al. (2018b) and Rakelly et al. (2019) (see Figure 2).
Dataset Splits	Yes	In each environment, a set of 100 tasks is sampled for meta-training, and a set of 100 tasks is sampled from the same task distribution for meta-testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependency details (e.g., library names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	In our experiments we found a relatively low relabeling probability (e.g., K = 0.1 and 0.3) was often most effective for HTR (see Figure 8). and Refer to the supplement for further details on the experimental setup.