reproducibilityindex.ai

Hindsight Foresight Relabeling for Meta-Reinforcement Learning

Authors: Michael Wan, Jian Peng, Tanmay Gangwani

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We find that HFR improves performance when compared to other relabeling methods on a variety of meta-RL tasks. We evaluate on a set of both sparse and dense reward Mu Jo Co environments (Todorov et al., 2012) modeled in Open AI Gym (Brockman et al., 2016). Figure 6 plots the performance (average returns or success-rate) on the held-out meta-test tasks on the y-axis, with the total timesteps of environment interaction for meta-training on the x-axis.
Researcher Affiliation	Academia	Michael Wan, Jian Peng & Tanmay Gangwani University of Illinois at Urbana-Champaign
Pseudocode	Yes	Algorithm 1: Hindsight Foresight Relabeling (HFR). Algorithm 2: Computation of the utility function based on the Bellman error (Eq. 10), for PEARL-based meta-RL.
Open Source Code	Yes	Code: https://www.github.com/michaelwan11/hfr
Open Datasets	Yes	We evaluate on a set of both sparse and dense reward Mu Jo Co environments (Todorov et al., 2012) modeled in Open AI Gym (Brockman et al., 2016). Ant-Goal: We use the Ant-Goal task from (Gupta et al., 2018). Ant-Vel: We use the Ant environment from Open AI gym. Cheetah-Highdim: We take the Cheetah-Highdim task from (Lin et al., 2020).
Dataset Splits	No	The paper provides details on 'Train Tasks' and 'Test Tasks' in Table 2, but no explicit mention of a separate validation split for hyperparameter tuning or early stopping. Standard practice in RL often combines validation with training or uses separate test sets after training.
Hardware Specification	No	The paper does not explicitly mention the specific hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only refers to 'Mu Jo Co environments' and 'Open AI Gym'.
Software Dependencies	No	Table 1 lists PEARL hyperparameters such as 'Nonlinearity Re LU', 'Optimizer Adam', but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or other libraries. It only mentions using 'PEARL' as the base algorithm.
Experiment Setup	Yes	Table 1: PEARL hypeparameters used for all experiments. Hyperparameter Value: Nonlinearity Re LU, Optimizer Adam, Policy Learning Rate 3e 4, Q-function Learning Rate 3e 4, Batch Size 256, Replay Buffer Size 1e6. Table 2: Environment Details includes: Discount, Horizon, Train Tasks, Test Tasks, Number of Exploration Steps.