reproducibilityindex.ai

Focus On What Matters: Separated Models For Visual-Based RL Generalization

Authors: Di Zhang, Bowen Lv, Hai Zhang, Feifan Yang, Junqiao Zhao, Hang Yu, Chang Huang, Hongtu Zhou, Chen Ye, changjun jiang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications.
Researcher Affiliation	Academia	Di Zhang Bowen Lv Hai Zhang Feifan Yang Junqiao Zhao Hang Yu Chang Huang Hongtu Zhou Chen Ye Changjun Jiang Department of Computer Science, Tongji University, Shanghai, China MOE Key Lab of Embedded System and Service Computing, Tongji University, Shanghai, China {2331922, 2151769, zhanghai12138, 2153299, zhaojunqiao}@tongji.edu.cn {2053881, 2130790, zhouhongtu, yechen, cjjiang}@tongji.edu.cn
Pseudocode	Yes	Algorithm 1 SAC with Separated Models
Open Source Code	Yes	Source code is available at https://anonymous.4open.science/r/SMG/.
Open Datasets	Yes	We evaluate SMG’s effectiveness across a range of challenging visual-based RL tasks, including five tasks from DMControl [36] and two more realistic robotic manipulation tasks [17].
Dataset Splits	Yes	We train all methods for 500k steps (except walker-stand for 250k, as it converges faster) on the training setting and evaluate the zero-shot generalization performance on the four evaluation settings.
Hardware Specification	Yes	We conduct all experiments on a single machine equipped with an AMD EPYC 7B12 CPU (64 cores), 512GB RAM, and eight NVIDIA Ge Force RTX 3090 GPUs (24 GB memory).
Software Dependencies	No	The paper mentions using SAC as the base algorithm and Adam optimizer, but it does not specify exact version numbers for these software components or any other libraries used.
Experiment Setup	Yes	We report the hyperparameters used in our experiments in Table 5. We use the same hyperparameters for all seven tasks, except the action repeat and the mask ratio ρ. The Laux in SMG comprises five loss terms, which seems challenging to balance the weights. However, through experiments, we found that setting average weights for Lrecon, Lmask, Laction, Lback is sufficient to achieve good performance (except the λback is set to 2 since the background model should train to fit more complex images). Regarding the Lfore, a too-large weight would lead to the model overfitting the inaccurate attribution predictions in the early stage (as we use the model output under raw observation as ground truth), so we set it to 0.1.