Focus On What Matters: Separated Models For Visual-Based RL Generalization
Authors: Di Zhang, Bowen Lv, Hai Zhang, Feifan Yang, Junqiao Zhao, Hang Yu, Chang Huang, Hongtu Zhou, Chen Ye, changjun jiang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications. |
| Researcher Affiliation | Academia | Di Zhang Bowen Lv Hai Zhang Feifan Yang Junqiao Zhao Hang Yu Chang Huang Hongtu Zhou Chen Ye Changjun Jiang Department of Computer Science, Tongji University, Shanghai, China MOE Key Lab of Embedded System and Service Computing, Tongji University, Shanghai, China {2331922, 2151769, zhanghai12138, 2153299, zhaojunqiao}@tongji.edu.cn {2053881, 2130790, zhouhongtu, yechen, cjjiang}@tongji.edu.cn |
| Pseudocode | Yes | Algorithm 1 SAC with Separated Models |
| Open Source Code | Yes | Source code is available at https://anonymous.4open.science/r/SMG/. |
| Open Datasets | Yes | We evaluate SMG’s effectiveness across a range of challenging visual-based RL tasks, including five tasks from DMControl [36] and two more realistic robotic manipulation tasks [17]. |
| Dataset Splits | Yes | We train all methods for 500k steps (except walker-stand for 250k, as it converges faster) on the training setting and evaluate the zero-shot generalization performance on the four evaluation settings. |
| Hardware Specification | Yes | We conduct all experiments on a single machine equipped with an AMD EPYC 7B12 CPU (64 cores), 512GB RAM, and eight NVIDIA Ge Force RTX 3090 GPUs (24 GB memory). |
| Software Dependencies | No | The paper mentions using SAC as the base algorithm and Adam optimizer, but it does not specify exact version numbers for these software components or any other libraries used. |
| Experiment Setup | Yes | We report the hyperparameters used in our experiments in Table 5. We use the same hyperparameters for all seven tasks, except the action repeat and the mask ratio ρ. The Laux in SMG comprises five loss terms, which seems challenging to balance the weights. However, through experiments, we found that setting average weights for Lrecon, Lmask, Laction, Lback is sufficient to achieve good performance (except the λback is set to 2 since the background model should train to fit more complex images). Regarding the Lfore, a too-large weight would lead to the model overfitting the inaccurate attribution predictions in the early stage (as we use the model output under raw observation as ground truth), so we set it to 0.1. |