Efficient RL via Disentangled Environment and Agent Representations

Authors: Kevin Gmelin, Shikhar Bahl, Russell Mendonca, Deepak Pathak

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our method, Structured Environment-Agent Representations (SEAR), outperforms state-of-the-art model-free approaches over 18 different challenging visual simulation environments spanning 5 different robots. Empirically, we find that SEAR outperforms state-of-the-art model-free approaches for RL on a large suite of 18 different challenging environments, spanning 5 different robots, including the sawyer, franka and adroit-hand robots.
Researcher Affiliation Academia Kevin Gmelin * 1 Shikhar Bahl * 1 Russell Mendonca 1 Deepak Pathak 1 1Carnegie Mellon University. Correspondence to: Kevin Gmelin <kgmelin11@gmail.com>, Shikhar Bahl <sbahl2@andrew.cmu.edu>.
Pseudocode Yes Algorithm 1 SEAR: Structured Environment and Agent Representations for Control
Open Source Code Yes The code of ours can be found at https://sear-rl.github.io
Open Datasets Yes Meta-World (Yu et al., 2020) Table-top manipulation tasks performed by a Sawyer robot arm. Franka Kitchen (Gupta et al., 2019) Manipulating objects in a realistic kitchen with a Franka arm. Hand Manipulation Suite (Rajeswaran et al., 2017) Manipulating objects with an Adroit hand. Distracting Control Suite (Stone et al., 2021) A variant of the DM Control suite (Tassa et al., 2018) with distractions added. Note that we used implementations of the Hand Manipulation Suite and Franka Kitchen from the D4RL benchmark (Fu et al., 2020).
Dataset Splits No The paper describes environmental settings and training hyperparameters, and mentions "Evaluation Frequency" and "Evaluation Episodes" (Table 2), but does not specify dataset splits (e.g., percentages or sample counts for training, validation, and test sets) in the way a supervised learning paper would. In reinforcement learning, data is typically generated interactively and stored in a replay buffer, rather than pre-split.
Hardware Specification Yes We train each model using a mix of RTX6000, A5000, A6000 or 2080Ti GPUs.
Software Dependencies No The paper mentions software like the 'Dr Qv2 algorithm', 'Mujoco-simulated environments', and 'D4RL benchmark', but does not provide specific version numbers for these or other software dependencies like programming languages or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Table 2: Hyperparameter Settings provides specific values for numerous experimental parameters, including 'Replay Buffer Size 2.5e5', 'Frame Stack 3', 'Batch Size 256', 'Learning Rate 1e-4', 'Reconstruction Loss Scaling Coefficient (c1) 0.01', and 'Mask Loss Scaling Coefficient (c2) 0.0025', among others.