A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning

Authors: Wenhao Yang, Xiang Li, Zhihua Zhang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically analyze the numerical properties of optimal policies and compare the performance of different sparse regularization forms in discrete and continuous environments.
Researcher Affiliation Academia Xiang Li School of Mathematical Sciences Peking University Beijing, China lx10077@pku.edu.cn Wenhao Yang Center for Data Science Peking University Beijing, China yangwenhaosms@pku.edu.cn Zhihua Zhang National Engineering Lab for Big Data Analysis and Applications School of Mathematical Sciences Peking University Beijing, China zhzhang@math.pku.edu.cn
Pseudocode Yes Algorithm 1 Regularized Actor-Critic (RAC)
Open Source Code No The paper does not provide any concrete access information (e.g., a link or explicit statement of code release) for its source code.
Open Datasets Yes We test four basic regularizers across four discrete control tasks from Open AI Gym benchmark [5]. All the training details are in Appendix H.2. ... We explore basic regularizers across four continuous control tasks from Open AI Gym benchmark [5] with the Mu Jo Co simulator [38].
Dataset Splits No The paper mentions 'training details' and 'experiment details' are in Appendices H.1, H.2, and H.3, but it does not specify exact training/validation/test dataset splits (percentages or sample counts) in the main text.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions tools like 'Open AI Gym benchmark [5]' and 'Mu Jo Co simulator [38]', and an optimizer 'Adam [17]', but it does not provide specific version numbers for any software dependencies.
Experiment Setup Yes All the training details are in Appendix H.2. ... For each curve, we train four different instances with different random seeds. ... Each entry in the legend is named with the rule the regularization form + λ. The score is smoothed with 30 windows while the shaded area is the one standard deviation. (Figures 2 and 3 captions imply λ values like {0.01, 0.1, 1.0} were tested, as mentioned in Section 5.2).