reproducibilityindex.ai

Control Regularization for Reduced Variance Reinforcement Learning

Authors: Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, Joel Burdick

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our approach empirically on a range of settings, and demonstrate signiﬁcantly reduced variance, guaranteed dynamic stability, and more efﬁcient learning than deep RL alone.
Researcher Affiliation	Academia	1California Institute of Technology, Pasadena, CA 2Rice University, Houston, TX 3University of Michigan, Ann Arbor, MI.
Pseudocode	Yes	Algorithm 1 Control Regularized RL (CORE-RL)
Open Source Code	Yes	All code can be found at https://github.com/rcheng805/CORE-RL.
Open Datasets	Yes	We apply the CORE-RL algorithm to control of the cartpole from the Open AI gym environment (Cart Pole-v1). ... The experimental setup and data collection process are described in (Ge et al., 2018).
Dataset Splits	No	The paper describes running experiments multiple times with different random seeds and splitting data into episodes, but does not provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or explicit splits for the environments).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions software components like DDPG, PPO, TRPO, Open AI gym, and TORCS, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For all three problems, we use DDPG as the policy gradient RL algorithm (Lillicrap et al., 2016). We use a neural network with 2 hidden layers with 64 neurons in each layer. We use the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001. We use a batch size of 64, and discount factor of 0.99. We use a replay buffer with size of 106. We found that the Adaptive Mixing Strategy performs best when λmax = 50, and C = 0.0005.