Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning

Authors: Philippe Hansen-Estruch, Amy Zhang, Ashvin Nair, Patrick Yin, Sergey Levine

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks. Further, we prove that this learned representation is sufficient not only for goal-conditioned tasks, but is amenable to any downstream task described by a state-only reward function. Videos can be found at https://sites.google.com/ view/gc-bisimulation. Our contributions consist of the following: 4) an evaluation on manipulation environments that shows improved performance on goal-conditioned tasks compared to other self-supervised representation learning methods.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Meta AI Research. Correspondence to: Philippe Hansen Estruch <hansenpmeche@berkeley.edu>, Amy Zhang <amyzhang@fb.com>.
Pseudocode Yes Algorithm 1 details the training of GCB.
Open Source Code No The paper mentions a link for videos ('Videos can be found at https://sites.google.com/ view/gc-bisimulation') but does not provide an explicit statement or link for the open-source code of the methodology itself.
Open Datasets No To collect the offline replay buffer, we use a noisy expert policy πdemo(ai|s, g) = π (ai|s, g) + N(0, 0.3) where 1 ai 1 i. 50K transitions are collected for the training set for the following offline RL experiments, where the demonstrated policy reaches the goal in roughly 80% of attempted episodes.
Dataset Splits No The paper states that '50K transitions are collected for the training set' but does not explicitly provide information about training/validation/test dataset splits, exact percentages, or sample counts for each split.
Hardware Specification No The paper mentions 'compute support from Google Cloud, Berkeley Research Computing, Meta, and Azure' in the acknowledgements, but it does not specify any particular hardware details such as exact GPU or CPU models, memory, or specific cloud instance types used for experiments.
Software Dependencies No We modify a Py Torch implementation of IQL (Kostrikov et al., 2021) for our offline RL algorithm.
Experiment Setup Yes The hyperparameters used for the experiment are in Table 2. We use Adam for optimization (Kingma and Ba, 2015). As we focus on the offline RL setting in our experiments, we implement our method on top of implicit Q-learning (IQL) (Kostrikov et al., 2021), a recent offline RL algorithm. The representation and policy are trained concurrently. Table 2. GCB Hyperparameters.