Maximizing Ensemble Diversity in Deep Reinforcement Learning

Authors: Hassam Sheikh, Mariano Phielipp, Ladislau Boloni

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We integrated MED-RL in five of the most common ensemble-based deep RL algorithms for both continuous and discrete control tasks and evaluated on six Mujoco environments and six Atari games. Our results show that MED-RL augmented algorithms outperform their un-regularized counterparts significantly and in some cases achieved more than 300% in performance gains.
Researcher Affiliation Collaboration Hassam Ullah Sheikh Intel Labs hassam.sheikh@intel.com Mariano Phielipp Intel Labs mariano.j.phielipp@intel.com Ladislau B ol oni Department of Computer Science University of Central Florida lboloni@cs.ucf.edu
Pseudocode Yes Algorithm 1: MED-RL: Maxmin DQN version The differences between the baseline Maxmin DQN and MEDRL-Maxmin DQN are highlighted Initialize N Q-functions {Q1, . . . , QN} parameterized by {ψ1, . . . , ψN} Initialize empty replay buffer D Observe initial state s while Agent is interacting with the Environment do...
Open Source Code No For our implementation of Maxmin DQN and Ensemble DQN, we used the code provided by the Maxmin DQN authors that has implementations of different DQN based methods github.com/qlan3/Explorer). This refers to a third-party repository that was used as a base, not the open-sourcing of the authors' own MED-RL code or modifications.
Open Datasets Yes We integrated MED-RL in five of the most common ensemble-based deep RL algorithms for both continuous and discrete control tasks and evaluated on six Mujoco environments and six Atari games. Our results show that MED-RL augmented algorithms outperform their un-regularized counterparts significantly and in some cases achieved more than 300% in performance gains.
Dataset Splits No The paper describes training durations (e.g., 'after 1M timesteps', 'for 300K timesteps only') and evaluation roll-outs, which are standard for Reinforcement Learning. However, it does not specify explicit 'training/test/validation dataset splits' in the typical supervised learning sense (e.g., '80% training, 10% validation, 10% test').
Hardware Specification Yes All the experiments were performed on a Kubernetes managed cluster with Nvidia V100 GPUs and Intel Skylake CPUs. Each experiment was run as an individual Kubernetes job with 11 CPUs, 16GB of RAM and 1 GPU (if needed). This configuration allowed us to run experiments without any interference from other applications which was important to accurately measure the wall-clock time.
Software Dependencies No The paper states, 'For our implementation of Maxmin DQN and Ensemble DQN, we used the code provided by the Maxmin DQN authors that has implementations of different DQN based methods github.com/qlan3/Explorer)'. While this mentions a specific code base, it does not provide version numbers for any software dependencies, such as Python, PyTorch/TensorFlow, or other libraries critical for reproducibility.
Experiment Setup Yes Table 11: Hyperparameters for discrete control tasks; Table 12: Hyperparameters for continuous control tasks. These tables list specific values for Target Weight, Actor Learning Rate, Critic Learning Rate, Replay Buffer, Batch Size, Exploration Steps, Optimizer, Hidden Layer Size, Number of critics (REDQ), and Regularization Weight.