Learning Invariant Representations for Reinforcement Learning without Reconstruction
Authors: Amy Zhang, Rowan Thomas McAllister, Roberto Calandra, Yarin Gal, Sergey Levine
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations demonstrate our nonreconstructive approach using bisimulation is substantially more robust to task-irrelevant distractors when compared to prior approaches that use reconstruction losses or contrastive losses. Our initial experiments insert natural videos into the background of Mo Jo Co control task as complex distraction. Our second setup is a high-fidelity highway driving task using CARLA (Dosovitskiy et al., 2017), showing that our representations can be trained effectively even on highly realistic images with many distractions, such as trees, clouds, buildings, and shadows.6 Experiments Our central hypothesis is that our non-reconstructive bisimulation based representation learning approach should be substantially more robust to task-irrelevant distractors. To that end, we evaluate our method in a clean setting without distractors, as well as a much more difficult setting with distractors. We compare against several baselines. |
| Researcher Affiliation | Collaboration | Amy Zhang 12 Rowan Mc Allister 3 Roberto Calandra2 Yarin Gal4 Sergey Levine3 1Mc Gill University 2Facebook AI Research 3University of California, Berkeley 4OATML group, University of Oxford |
| Pseudocode | Yes | Algorithm 1 Deep Bisimulation for Control (DBC) and Algorithm 2 Train Policy (changes to SAC in blue) |
| Open Source Code | No | Code and install instructions in appendix. |
| Open Datasets | Yes | Natural Video Setting. Then, we incorporate natural video from the Kinetics dataset (Kay et al., 2017) as background (Zhang et al., 2018), shown in Figure 3 (bottom row).we construct a highway driving scenario with photo-realistic visual observations using the CARLA simulator (Dosovitskiy et al., 2017) shown in Figure 7. |
| Dataset Splits | No | No specific text found that details train/validation/test dataset splits (e.g., percentages or sample counts). The paper mentions using DMC suite and CARLA, which have standard environments, but does not explicitly state how data was split for training, validation, and testing. |
| Hardware Specification | Yes | Each run took 12 hours on a GTX 1080 GPU. |
| Software Dependencies | No | We modify the Soft Actor-Critic Py Torch implementation by Yarats & Kostrikov (2020) |
| Experiment Setup | Yes | Table 2: A complete overview of used hyper parameters. Parameter name Value Replay buffer capacity 10^6 Batch size 128 Discount γ 0.99 Optimizer Adam Critic learning rate 10^-5 Critic target update frequency 2 Critic Q-function soft-update rate τQ 0.005 Critic encoder soft-update rate τφ 0.005 Actor learning rate 10^-5 Actor update frequency 2 Actor log stddev bounds [-5, 2] Encoder learning rate 10^-5 Decoder learning rate 10^-5 Decoder weight decay 10^-7 Temperature learning rate 10^-4 Temperature Adam s β1 0.9 Init temperature 0.1 |