Focus On What Matters: Separated Models For Visual-Based RL Generalization

Authors: Di Zhang, Bowen Lv, Hai Zhang, Feifan Yang, Junqiao Zhao, Hang Yu, Chang Huang, Hongtu Zhou, Chen Ye, changjun jiang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications.
Researcher Affiliation Academia Di Zhang Bowen Lv Hai Zhang Feifan Yang Junqiao Zhao Hang Yu Chang Huang Hongtu Zhou Chen Ye Changjun Jiang Department of Computer Science, Tongji University, Shanghai, China MOE Key Lab of Embedded System and Service Computing, Tongji University, Shanghai, China {2331922, 2151769, zhanghai12138, 2153299, zhaojunqiao}@tongji.edu.cn {2053881, 2130790, zhouhongtu, yechen, cjjiang}@tongji.edu.cn
Pseudocode Yes Algorithm 1 SAC with Separated Models
Open Source Code Yes Source code is available at https://anonymous.4open.science/r/SMG/.
Open Datasets Yes We evaluate SMG’s effectiveness across a range of challenging visual-based RL tasks, including five tasks from DMControl [36] and two more realistic robotic manipulation tasks [17].
Dataset Splits Yes We train all methods for 500k steps (except walker-stand for 250k, as it converges faster) on the training setting and evaluate the zero-shot generalization performance on the four evaluation settings.
Hardware Specification Yes We conduct all experiments on a single machine equipped with an AMD EPYC 7B12 CPU (64 cores), 512GB RAM, and eight NVIDIA Ge Force RTX 3090 GPUs (24 GB memory).
Software Dependencies No The paper mentions using SAC as the base algorithm and Adam optimizer, but it does not specify exact version numbers for these software components or any other libraries used.
Experiment Setup Yes We report the hyperparameters used in our experiments in Table 5. We use the same hyperparameters for all seven tasks, except the action repeat and the mask ratio ρ. The Laux in SMG comprises five loss terms, which seems challenging to balance the weights. However, through experiments, we found that setting average weights for Lrecon, Lmask, Laction, Lback is sufficient to achieve good performance (except the λback is set to 2 since the background model should train to fit more complex images). Regarding the Lfore, a too-large weight would lead to the model overfitting the inaccurate attribution predictions in the early stage (as we use the model output under raw observation as ground truth), so we set it to 0.1.