reproducibilityindex.ai

Learning Invariant Representations for Reinforcement Learning without Reconstruction

Authors: Amy Zhang, Rowan Thomas McAllister, Roberto Calandra, Yarin Gal, Sergey Levine

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluations demonstrate our nonreconstructive approach using bisimulation is substantially more robust to task-irrelevant distractors when compared to prior approaches that use reconstruction losses or contrastive losses. Our initial experiments insert natural videos into the background of Mo Jo Co control task as complex distraction. Our second setup is a high-ﬁdelity highway driving task using CARLA (Dosovitskiy et al., 2017), showing that our representations can be trained effectively even on highly realistic images with many distractions, such as trees, clouds, buildings, and shadows.6 Experiments Our central hypothesis is that our non-reconstructive bisimulation based representation learning approach should be substantially more robust to task-irrelevant distractors. To that end, we evaluate our method in a clean setting without distractors, as well as a much more difﬁcult setting with distractors. We compare against several baselines.
Researcher Affiliation	Collaboration	Amy Zhang 12 Rowan Mc Allister 3 Roberto Calandra2 Yarin Gal4 Sergey Levine3 1Mc Gill University 2Facebook AI Research 3University of California, Berkeley 4OATML group, University of Oxford
Pseudocode	Yes	Algorithm 1 Deep Bisimulation for Control (DBC) and Algorithm 2 Train Policy (changes to SAC in blue)
Open Source Code	No	Code and install instructions in appendix.
Open Datasets	Yes	Natural Video Setting. Then, we incorporate natural video from the Kinetics dataset (Kay et al., 2017) as background (Zhang et al., 2018), shown in Figure 3 (bottom row).we construct a highway driving scenario with photo-realistic visual observations using the CARLA simulator (Dosovitskiy et al., 2017) shown in Figure 7.
Dataset Splits	No	No specific text found that details train/validation/test dataset splits (e.g., percentages or sample counts). The paper mentions using DMC suite and CARLA, which have standard environments, but does not explicitly state how data was split for training, validation, and testing.
Hardware Specification	Yes	Each run took 12 hours on a GTX 1080 GPU.
Software Dependencies	No	We modify the Soft Actor-Critic Py Torch implementation by Yarats & Kostrikov (2020)
Experiment Setup	Yes	Table 2: A complete overview of used hyper parameters. Parameter name Value Replay buffer capacity 10^6 Batch size 128 Discount γ 0.99 Optimizer Adam Critic learning rate 10^-5 Critic target update frequency 2 Critic Q-function soft-update rate τQ 0.005 Critic encoder soft-update rate τφ 0.005 Actor learning rate 10^-5 Actor update frequency 2 Actor log stddev bounds [-5, 2] Encoder learning rate 10^-5 Decoder learning rate 10^-5 Decoder weight decay 10^-7 Temperature learning rate 10^-4 Temperature Adam s β1 0.9 Init temperature 0.1