METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
Authors: Seohong Park, Oleh Rybkin, Sergey Levine
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through our experiments in benchmark environments, we aim to answer the following questions: (1) Can METRA scale to complex, high-dimensional environments, including domains with image observations? (2) Does METRA discover meaningful behaviors in complex environments with no supervision? (3) Are the behaviors discovered by METRA useful for downstream tasks? |
| Researcher Affiliation | Academia | 1University of California, Berkeley seohong@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Metric-Aware Abstraction (METRA) |
| Open Source Code | Yes | Our code and videos are available at https://seohong.me/projects/metra/ |
| Open Datasets | Yes | We evaluate our method on five robotic locomotion and manipulation environments (Figure 4): state-based Ant and Half Cheetah from Gym (Todorov et al., 2012; Brockman et al., 2016), pixel-based Quadruped and Humanoid from the Deep Mind Control (DMC) Suite (Tassa et al., 2018), and a pixel-based version of Kitchen from Gupta et al. (2019); Mendonca et al. (2021). |
| Dataset Splits | No | The paper mentions evaluating policy coverage at each evaluation epoch and uses the term "evaluation epoch" (Figure 5), but it does not explicitly specify distinct training, validation, and test splits for the datasets or how validation data would be used for hyperparameter tuning or early stopping to ensure reproducibility of the splits themselves. |
| Hardware Specification | Yes | We run our experiments on an internal cluster consisting of A5000 GPUs. |
| Software Dependencies | No | The paper mentions using Adam optimizer and Soft Actor-Critic (SAC) as an RL backbone, but it does not specify version numbers for these software components or any other libraries like Python, PyTorch, etc. |
| Experiment Setup | Yes | We present the full list of hyperparameters used for skill discovery methods in Table 2. Table 2: Hyperparameters for unsupervised skill discovery methods. Learning rate 0.0001 Optimizer Adam (Kingma & Ba, 2015) # episodes per epoch 8 # gradient steps per epoch 200 (Quadruped, Humanoid), 100 (Kitchen), 50 (Ant, Half Cheetah) Minibatch size 256 Discount factor γ 0.99 Replay buffer size 106 (Ant, Half Cheetah), 105 (Kitchen), 3 105 (Quadruped, Humanoid) Encoder CNN (Le Cun et al., 1989) # hidden layers 2 # hidden units per layer 1024 Target network smoothing coefficient 0.995 Entropy coefficient 0.01 (Kitchen), auto-adjust (Haarnoja et al., 2018b) (others) METRA ε 10 3 METRA initial λ 30 |