reproducibilityindex.ai

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Authors: Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTAL EVALUATION OF DR3 Our experiments aim to evaluate the extent to which DR3 improves performance in ofﬂine RL in practice, and to study its effect on prior observations of rank collapse. To this end, we investigate if DR3 improves ofﬂine RL performance and stability on three ofﬂine RL benchmarks: Atari 2600 games with discrete actions [2], continuous control tasks from D4RL [18], and image-based robotic manipulation tasks [38].
Researcher Affiliation	Collaboration	Aviral Kumar1,2 Rishabh Agarwal2,3 Tengyu Ma4 Aaron Courville3 George Tucker2 Sergey Levine1,2 1 UC Berkeley 2 Google Research 3 MILA 4 Stanford University
Pseudocode	No	The paper describes algorithms in text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	Atari 2600 games with discrete actions [2], continuous control tasks from D4RL [18], and image-based robotic manipulation tasks [38]. ... We ran supervised regression and three variants of approximate dynamic programming (ADP) on an ofﬂine dataset consisting of 1% of uniformly-sampled data from the replay buffer of DQN on two Atari games, previously used in Agarwal et al. [2].
Dataset Splits	No	The paper specifies training parameters like mini-batch size and target network update period, and mentions tuning hyperparameters on a subset of games. However, it does not provide explicit percentages or sample counts for training, validation, and test dataset splits, nor does it refer to predefined splits with specific citations for reproduction.
Hardware Specification	No	The paper mentions utilizing 'compute resources from Microsoft Azure and Google Cloud' in the acknowledgements, but it does not provide specific details such as GPU/CPU models, memory, or processor types used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other library versions) required to reproduce the experiments.
Experiment Setup	Yes	We utilize identical hyperparameters of the base ofﬂine RL algorithms when DR3 is used, where the base hyper-parameters correspond to the ones provided in the corresponding publications. DR3 requires us to tune the additional coefﬁcient c0, that weights the DR3 explicit regularizer term. In order to ﬁnd this value on our domains, we followed the tuning strategy typically followed on Atari, where we evaluated four different values of c0 {0.001, 0.01, 0.03, 0.3} on 5 games... Table E.1: Hyperparameters used by the ofﬂine RL Atari agents in our experiments. ... Mini-batch size 32 Target network update period every 2000 updates Training environment steps per iteration 250K ... Q-network: channels 32, 64, 64 Q-network: filter size 8 8, 4 4, 3 3 Q-network: stride 4, 2, 1 Q-network: hidden units 512