Efficient RL via Disentangled Environment and Agent Representations
Authors: Kevin Gmelin, Shikhar Bahl, Russell Mendonca, Deepak Pathak
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our method, Structured Environment-Agent Representations (SEAR), outperforms state-of-the-art model-free approaches over 18 different challenging visual simulation environments spanning 5 different robots. Empirically, we find that SEAR outperforms state-of-the-art model-free approaches for RL on a large suite of 18 different challenging environments, spanning 5 different robots, including the sawyer, franka and adroit-hand robots. |
| Researcher Affiliation | Academia | Kevin Gmelin * 1 Shikhar Bahl * 1 Russell Mendonca 1 Deepak Pathak 1 1Carnegie Mellon University. Correspondence to: Kevin Gmelin <kgmelin11@gmail.com>, Shikhar Bahl <sbahl2@andrew.cmu.edu>. |
| Pseudocode | Yes | Algorithm 1 SEAR: Structured Environment and Agent Representations for Control |
| Open Source Code | Yes | The code of ours can be found at https://sear-rl.github.io |
| Open Datasets | Yes | Meta-World (Yu et al., 2020) Table-top manipulation tasks performed by a Sawyer robot arm. Franka Kitchen (Gupta et al., 2019) Manipulating objects in a realistic kitchen with a Franka arm. Hand Manipulation Suite (Rajeswaran et al., 2017) Manipulating objects with an Adroit hand. Distracting Control Suite (Stone et al., 2021) A variant of the DM Control suite (Tassa et al., 2018) with distractions added. Note that we used implementations of the Hand Manipulation Suite and Franka Kitchen from the D4RL benchmark (Fu et al., 2020). |
| Dataset Splits | No | The paper describes environmental settings and training hyperparameters, and mentions "Evaluation Frequency" and "Evaluation Episodes" (Table 2), but does not specify dataset splits (e.g., percentages or sample counts for training, validation, and test sets) in the way a supervised learning paper would. In reinforcement learning, data is typically generated interactively and stored in a replay buffer, rather than pre-split. |
| Hardware Specification | Yes | We train each model using a mix of RTX6000, A5000, A6000 or 2080Ti GPUs. |
| Software Dependencies | No | The paper mentions software like the 'Dr Qv2 algorithm', 'Mujoco-simulated environments', and 'D4RL benchmark', but does not provide specific version numbers for these or other software dependencies like programming languages or libraries (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Table 2: Hyperparameter Settings provides specific values for numerous experimental parameters, including 'Replay Buffer Size 2.5e5', 'Frame Stack 3', 'Batch Size 256', 'Learning Rate 1e-4', 'Reconstruction Loss Scaling Coefficient (c1) 0.01', and 'Mask Loss Scaling Coefficient (c2) 0.0025', among others. |