reproducibilityindex.ai

Planning Goals for Exploration

Authors: Edward S. Hu, Richard Chang, Oleh Rybkin, Dinesh Jayaraman

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PEG and other goal-conditioned RL agents on four different continuous-control environments, described below. For each environment, we deﬁne an evaluation set of goals as a general principle, we pick evaluation goals in each environment that require extensive exploration in order for the agent to learn a successful evaluation goal reaching policy. ... Figure 4: Performance of agents with different goal setting strategies. All methods are run with 10 seeds. PEG outperforms all baselines, and its performance gain increases with environment difﬁculty.
Researcher Affiliation	Academia	GRASP Lab, Department of CIS, University of Pennsylvania {hued, huangkun, oleh, dineshj}@seas.upenn.edu
Pseudocode	Yes	Algorithm 1 LEXA Training Loop
Open Source Code	No	To ensure reproducibility, we will release the codebase that contains our method, baselines, and environments.
Open Datasets	Yes	Walker: In this environment from Tassa et al. (2018)... Ant Maze: We increased exploration difﬁculty in the Ant Maze environment from MEGA (Pitis et al., 2020)... 3-Block Stacking: This environment is a modiﬁcation from the Fetch Stack3 environment in Pitis et al. (2020). Point Maze: This environment is taken directly from Pitis et al. (2020) with no modiﬁcations.
Dataset Splits	No	The paper describes a reinforcement learning setup where data is collected through interaction, and defines 'evaluation goals' for testing, but does not provide specific train/validation/test dataset splits in terms of percentages or counts from a static dataset, which are typically found in supervised learning tasks.
Hardware Specification	Yes	Each seed was run on 1 GPU (Nvidia 2080ti or Nvidia 3090) and 4 CPUs, and required 11GB of GPU memory.
Software Dependencies	No	The paper mentions using 'Dreamer V2 (Hafner et al., 2021)' and 'LEXA (Mendonca et al., 2021)' but does not provide specific version numbers for these or other software libraries.
Experiment Setup	Yes	We used the default hyperparameters for training the world model, policies, value functions, and temporal reward functions. For PEG, we tried various values of K for simulating trajectories of πG for each goal and found K = 1 to be sufﬁcient. We use the same Go-explore mechanism across all goal-setting methods: the Go and Explore phases time limits are set to half of the maximum episode length for all environments, while non-Go-explore baselines use the full episode length for exploration. ... For each experiment, we tried weight values of (1, 2, 10) by running 1-2 seeds of PEG for each value. We used a weight of 1, 2, 2, 10 for the 4 experiments respectively . PEG uses MPPI, a sample-based optimizer, to optimize the objective. ... We therefore just choose as many samples (2000 candidates) and rounds (5 optimization rounds) as we can while keeping training time reasonable.