Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
Authors: Charles Packer, Pieter Abbeel, Joseph E. Gonzalez
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments that previously required dense reward during meta-training to solve. Our approach solves these environments using the true sparse reward function, with performance comparable to training with a proxy dense reward function. |
| Researcher Affiliation | Academia | Charles Packer UC Berkeley Pieter Abbeel UC Berkeley Joseph E. Gonzalez UC Berkeley |
| Pseudocode | Yes | Algorithm 1: Hindsight Task Relabelling for Off-Policy Meta-Reinforcement Learning |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate our method on a suite of sparse reward environments based those proposed by Gupta et al. (2018b) and Rakelly et al. (2019) (see Figure 2). |
| Dataset Splits | Yes | In each environment, a set of 100 tasks is sampled for meta-training, and a set of 100 tasks is sampled from the same task distribution for meta-testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details (e.g., library names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | In our experiments we found a relatively low relabeling probability (e.g., K = 0.1 and 0.3) was often most effective for HTR (see Figure 8). and Refer to the supplement for further details on the experimental setup. |