METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Authors: Seohong Park, Oleh Rybkin, Sergey Levine

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through our experiments in benchmark environments, we aim to answer the following questions: (1) Can METRA scale to complex, high-dimensional environments, including domains with image observations? (2) Does METRA discover meaningful behaviors in complex environments with no supervision? (3) Are the behaviors discovered by METRA useful for downstream tasks?
Researcher Affiliation Academia 1University of California, Berkeley seohong@berkeley.edu
Pseudocode Yes Algorithm 1 Metric-Aware Abstraction (METRA)
Open Source Code Yes Our code and videos are available at https://seohong.me/projects/metra/
Open Datasets Yes We evaluate our method on five robotic locomotion and manipulation environments (Figure 4): state-based Ant and Half Cheetah from Gym (Todorov et al., 2012; Brockman et al., 2016), pixel-based Quadruped and Humanoid from the Deep Mind Control (DMC) Suite (Tassa et al., 2018), and a pixel-based version of Kitchen from Gupta et al. (2019); Mendonca et al. (2021).
Dataset Splits No The paper mentions evaluating policy coverage at each evaluation epoch and uses the term "evaluation epoch" (Figure 5), but it does not explicitly specify distinct training, validation, and test splits for the datasets or how validation data would be used for hyperparameter tuning or early stopping to ensure reproducibility of the splits themselves.
Hardware Specification Yes We run our experiments on an internal cluster consisting of A5000 GPUs.
Software Dependencies No The paper mentions using Adam optimizer and Soft Actor-Critic (SAC) as an RL backbone, but it does not specify version numbers for these software components or any other libraries like Python, PyTorch, etc.
Experiment Setup Yes We present the full list of hyperparameters used for skill discovery methods in Table 2. Table 2: Hyperparameters for unsupervised skill discovery methods. Learning rate 0.0001 Optimizer Adam (Kingma & Ba, 2015) # episodes per epoch 8 # gradient steps per epoch 200 (Quadruped, Humanoid), 100 (Kitchen), 50 (Ant, Half Cheetah) Minibatch size 256 Discount factor γ 0.99 Replay buffer size 106 (Ant, Half Cheetah), 105 (Kitchen), 3 105 (Quadruped, Humanoid) Encoder CNN (Le Cun et al., 1989) # hidden layers 2 # hidden units per layer 1024 Target network smoothing coefficient 0.995 Entropy coefficient 0.01 (Kitchen), auto-adjust (Haarnoja et al., 2018b) (others) METRA ε 10 3 METRA initial λ 30