Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control
Authors: Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. In this work, we take an empirical approach to assess the conventional paradigm which omits common regularization when learning deep RL models. |
| Researcher Affiliation | Academia | Zhuang Liu1 , Xuanlin Li1 , Bingyi Kang2, Trevor Darrell1 1University of California, Berkeley 2National University of Singapore |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Our code is available at https://github.com/xuanlinli17/iclr2021_rlreg. |
| Open Datasets | Yes | The algorithms with different regularizers are tested on nine continuous control tasks: Hopper, Walker, Half Cheetah, Ant, Humanoid, and Humanoid Standup from Mu Jo Co (Todorov et al., 2012); Humanoid, Atlas Forward Walk, and Humanoid Flagrun from Robo School (Open AI). Besides continuous control, we provide results on randomly sampled Atari environments (Bellemare et al., 2012) in Appendix S |
| Dataset Splits | No | The paper describes running each experiment independently with five seeds for statistical robustness and using the average return over the last 100 episodes as the final result for evaluation. However, it does not specify explicit training/validation/test dataset splits in terms of percentages or counts for a fixed dataset, which is typical for supervised learning; instead, data is generated dynamically from the environment during training and evaluation phases. |
| Hardware Specification | Yes | We used up to 16 NVIDIA Titan Xp GPUs and 96 Intel Xeon E5-2667 CPUs, and all experiments take roughly 57 days with resources fully utilized. |
| Software Dependencies | No | The paper mentions using specific software packages like "Open AI Baseline" and "Adam", and refers to "official implementation at (Haarnoja, 2018)" for SAC. However, it does not provide specific version numbers for these software components, which are necessary for full reproducibility. |
| Experiment Setup | Yes | On Mu Jo Co tasks, we keep all hyperparameters unchanged as in the codebase adopted. Since hyperparameters for the Robo School tasks are not included, we briefly tune the hyperparameters for each algorithm before we apply regularization (details in Appendix D). For details on regularization strength tuning, please see Appendix C. |