DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Authors: Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTAL EVALUATION OF DR3 Our experiments aim to evaluate the extent to which DR3 improves performance in offline RL in practice, and to study its effect on prior observations of rank collapse. To this end, we investigate if DR3 improves offline RL performance and stability on three offline RL benchmarks: Atari 2600 games with discrete actions [2], continuous control tasks from D4RL [18], and image-based robotic manipulation tasks [38].
Researcher Affiliation Collaboration Aviral Kumar1,2 Rishabh Agarwal2,3 Tengyu Ma4 Aaron Courville3 George Tucker2 Sergey Levine1,2 1 UC Berkeley 2 Google Research 3 MILA 4 Stanford University
Pseudocode No The paper describes algorithms in text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes Atari 2600 games with discrete actions [2], continuous control tasks from D4RL [18], and image-based robotic manipulation tasks [38]. ... We ran supervised regression and three variants of approximate dynamic programming (ADP) on an offline dataset consisting of 1% of uniformly-sampled data from the replay buffer of DQN on two Atari games, previously used in Agarwal et al. [2].
Dataset Splits No The paper specifies training parameters like mini-batch size and target network update period, and mentions tuning hyperparameters on a subset of games. However, it does not provide explicit percentages or sample counts for training, validation, and test dataset splits, nor does it refer to predefined splits with specific citations for reproduction.
Hardware Specification No The paper mentions utilizing 'compute resources from Microsoft Azure and Google Cloud' in the acknowledgements, but it does not provide specific details such as GPU/CPU models, memory, or processor types used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other library versions) required to reproduce the experiments.
Experiment Setup Yes We utilize identical hyperparameters of the base offline RL algorithms when DR3 is used, where the base hyper-parameters correspond to the ones provided in the corresponding publications. DR3 requires us to tune the additional coefficient c0, that weights the DR3 explicit regularizer term. In order to find this value on our domains, we followed the tuning strategy typically followed on Atari, where we evaluated four different values of c0 {0.001, 0.01, 0.03, 0.3} on 5 games... Table E.1: Hyperparameters used by the offline RL Atari agents in our experiments. ... Mini-batch size 32 Target network update period every 2000 updates Training environment steps per iteration 250K ... Q-network: channels 32, 64, 64 Q-network: filter size 8 8, 4 4, 3 3 Q-network: stride 4, 2, 1 Q-network: hidden units 512