reproducibilityindex.ai

A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning

Authors: Wenhao Yang, Xiang Li, Zhihua Zhang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically analyze the numerical properties of optimal policies and compare the performance of different sparse regularization forms in discrete and continuous environments.
Researcher Affiliation	Academia	Xiang Li School of Mathematical Sciences Peking University Beijing, China lx10077@pku.edu.cn Wenhao Yang Center for Data Science Peking University Beijing, China yangwenhaosms@pku.edu.cn Zhihua Zhang National Engineering Lab for Big Data Analysis and Applications School of Mathematical Sciences Peking University Beijing, China zhzhang@math.pku.edu.cn
Pseudocode	Yes	Algorithm 1 Regularized Actor-Critic (RAC)
Open Source Code	No	The paper does not provide any concrete access information (e.g., a link or explicit statement of code release) for its source code.
Open Datasets	Yes	We test four basic regularizers across four discrete control tasks from Open AI Gym benchmark [5]. All the training details are in Appendix H.2. ... We explore basic regularizers across four continuous control tasks from Open AI Gym benchmark [5] with the Mu Jo Co simulator [38].
Dataset Splits	No	The paper mentions 'training details' and 'experiment details' are in Appendices H.1, H.2, and H.3, but it does not specify exact training/validation/test dataset splits (percentages or sample counts) in the main text.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions tools like 'Open AI Gym benchmark [5]' and 'Mu Jo Co simulator [38]', and an optimizer 'Adam [17]', but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	All the training details are in Appendix H.2. ... For each curve, we train four different instances with different random seeds. ... Each entry in the legend is named with the rule the regularization form + λ. The score is smoothed with 30 windows while the shaded area is the one standard deviation. (Figures 2 and 3 captions imply λ values like {0.01, 0.1, 1.0} were tested, as mentioned in Section 5.2).