reproducibilityindex.ai

Maximizing Ensemble Diversity in Deep Reinforcement Learning

Authors: Hassam Sheikh, Mariano Phielipp, Ladislau Boloni

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We integrated MED-RL in ﬁve of the most common ensemble-based deep RL algorithms for both continuous and discrete control tasks and evaluated on six Mujoco environments and six Atari games. Our results show that MED-RL augmented algorithms outperform their un-regularized counterparts signiﬁcantly and in some cases achieved more than 300% in performance gains.
Researcher Affiliation	Collaboration	Hassam Ullah Sheikh Intel Labs hassam.sheikh@intel.com Mariano Phielipp Intel Labs mariano.j.phielipp@intel.com Ladislau B ol oni Department of Computer Science University of Central Florida lboloni@cs.ucf.edu
Pseudocode	Yes	Algorithm 1: MED-RL: Maxmin DQN version The differences between the baseline Maxmin DQN and MEDRL-Maxmin DQN are highlighted Initialize N Q-functions {Q1, . . . , QN} parameterized by {ψ1, . . . , ψN} Initialize empty replay buffer D Observe initial state s while Agent is interacting with the Environment do...
Open Source Code	No	For our implementation of Maxmin DQN and Ensemble DQN, we used the code provided by the Maxmin DQN authors that has implementations of different DQN based methods github.com/qlan3/Explorer). This refers to a third-party repository that was used as a base, not the open-sourcing of the authors' own MED-RL code or modifications.
Open Datasets	Yes	We integrated MED-RL in ﬁve of the most common ensemble-based deep RL algorithms for both continuous and discrete control tasks and evaluated on six Mujoco environments and six Atari games. Our results show that MED-RL augmented algorithms outperform their un-regularized counterparts signiﬁcantly and in some cases achieved more than 300% in performance gains.
Dataset Splits	No	The paper describes training durations (e.g., 'after 1M timesteps', 'for 300K timesteps only') and evaluation roll-outs, which are standard for Reinforcement Learning. However, it does not specify explicit 'training/test/validation dataset splits' in the typical supervised learning sense (e.g., '80% training, 10% validation, 10% test').
Hardware Specification	Yes	All the experiments were performed on a Kubernetes managed cluster with Nvidia V100 GPUs and Intel Skylake CPUs. Each experiment was run as an individual Kubernetes job with 11 CPUs, 16GB of RAM and 1 GPU (if needed). This conﬁguration allowed us to run experiments without any interference from other applications which was important to accurately measure the wall-clock time.
Software Dependencies	No	The paper states, 'For our implementation of Maxmin DQN and Ensemble DQN, we used the code provided by the Maxmin DQN authors that has implementations of different DQN based methods github.com/qlan3/Explorer)'. While this mentions a specific code base, it does not provide version numbers for any software dependencies, such as Python, PyTorch/TensorFlow, or other libraries critical for reproducibility.
Experiment Setup	Yes	Table 11: Hyperparameters for discrete control tasks; Table 12: Hyperparameters for continuous control tasks. These tables list specific values for Target Weight, Actor Learning Rate, Critic Learning Rate, Replay Buffer, Batch Size, Exploration Steps, Optimizer, Hidden Layer Size, Number of critics (REDQ), and Regularization Weight.