reproducibilityindex.ai

Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning

Authors: Philippe Hansen-Estruch, Amy Zhang, Ashvin Nair, Patrick Yin, Sergey Levine

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks. Further, we prove that this learned representation is sufﬁcient not only for goal-conditioned tasks, but is amenable to any downstream task described by a state-only reward function. Videos can be found at https://sites.google.com/ view/gc-bisimulation. Our contributions consist of the following: 4) an evaluation on manipulation environments that shows improved performance on goal-conditioned tasks compared to other self-supervised representation learning methods.
Researcher Affiliation	Collaboration	1University of California, Berkeley 2Meta AI Research. Correspondence to: Philippe Hansen Estruch <hansenpmeche@berkeley.edu>, Amy Zhang <amyzhang@fb.com>.
Pseudocode	Yes	Algorithm 1 details the training of GCB.
Open Source Code	No	The paper mentions a link for videos ('Videos can be found at https://sites.google.com/ view/gc-bisimulation') but does not provide an explicit statement or link for the open-source code of the methodology itself.
Open Datasets	No	To collect the ofﬂine replay buffer, we use a noisy expert policy πdemo(ai\|s, g) = π (ai\|s, g) + N(0, 0.3) where 1 ai 1 i. 50K transitions are collected for the training set for the following ofﬂine RL experiments, where the demonstrated policy reaches the goal in roughly 80% of attempted episodes.
Dataset Splits	No	The paper states that '50K transitions are collected for the training set' but does not explicitly provide information about training/validation/test dataset splits, exact percentages, or sample counts for each split.
Hardware Specification	No	The paper mentions 'compute support from Google Cloud, Berkeley Research Computing, Meta, and Azure' in the acknowledgements, but it does not specify any particular hardware details such as exact GPU or CPU models, memory, or specific cloud instance types used for experiments.
Software Dependencies	No	We modify a Py Torch implementation of IQL (Kostrikov et al., 2021) for our ofﬂine RL algorithm.
Experiment Setup	Yes	The hyperparameters used for the experiment are in Table 2. We use Adam for optimization (Kingma and Ba, 2015). As we focus on the ofﬂine RL setting in our experiments, we implement our method on top of implicit Q-learning (IQL) (Kostrikov et al., 2021), a recent ofﬂine RL algorithm. The representation and policy are trained concurrently. Table 2. GCB Hyperparameters.