reproducibilityindex.ai

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

Authors: Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we present the ﬁrst comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. In this work, we take an empirical approach to assess the conventional paradigm which omits common regularization when learning deep RL models.
Researcher Affiliation	Academia	Zhuang Liu1 , Xuanlin Li1 , Bingyi Kang2, Trevor Darrell1 1University of California, Berkeley 2National University of Singapore
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Our code is available at https://github.com/xuanlinli17/iclr2021_rlreg.
Open Datasets	Yes	The algorithms with different regularizers are tested on nine continuous control tasks: Hopper, Walker, Half Cheetah, Ant, Humanoid, and Humanoid Standup from Mu Jo Co (Todorov et al., 2012); Humanoid, Atlas Forward Walk, and Humanoid Flagrun from Robo School (Open AI). Besides continuous control, we provide results on randomly sampled Atari environments (Bellemare et al., 2012) in Appendix S
Dataset Splits	No	The paper describes running each experiment independently with ﬁve seeds for statistical robustness and using the average return over the last 100 episodes as the ﬁnal result for evaluation. However, it does not specify explicit training/validation/test dataset splits in terms of percentages or counts for a fixed dataset, which is typical for supervised learning; instead, data is generated dynamically from the environment during training and evaluation phases.
Hardware Specification	Yes	We used up to 16 NVIDIA Titan Xp GPUs and 96 Intel Xeon E5-2667 CPUs, and all experiments take roughly 57 days with resources fully utilized.
Software Dependencies	No	The paper mentions using specific software packages like "Open AI Baseline" and "Adam", and refers to "ofﬁcial implementation at (Haarnoja, 2018)" for SAC. However, it does not provide specific version numbers for these software components, which are necessary for full reproducibility.
Experiment Setup	Yes	On Mu Jo Co tasks, we keep all hyperparameters unchanged as in the codebase adopted. Since hyperparameters for the Robo School tasks are not included, we brieﬂy tune the hyperparameters for each algorithm before we apply regularization (details in Appendix D). For details on regularization strength tuning, please see Appendix C.