Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Authors: Edoardo Cetin, Philip J Ball, Stephen Roberts, Oya Celiktutan

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of A-LIX in pixel-based reinforcement learning tasks in two popular and distinct domains featuring a diverse set of continuous and discrete control problems.
Researcher Affiliation Academia 1Centre for Robotics Research, Department of Engineering, King s College London 2Department of Engineering Science, University of Oxford.
Pseudocode No The paper describes its methods but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes We open-source our code to facilitate reproducibility and future extensions1. 1https://github.com/Aladoro/Stabilizing-Off-Policy-RL
Open Datasets Yes We evaluate the effectiveness of A-LIX for pixel-based RL on continuous control tasks from the Deep Mind Control Suite (DMC) (Tassa et al., 2018). ... We make use of the popular Atari Learning Environment (ALE) (Bellemare et al., 2013)...
Dataset Splits Yes In Table 4, we show the performance in each of the evaluated 15 DMC environments by reporting the mean and standard deviations over the cumulative returns obtained midway and at the end of training for the medium and hard benchmark tasks, respectively.
Hardware Specification No The paper mentions 'support from Toyota Motor Corporation contributed towards funding the utilized computational resources,' but does not provide specific details such as GPU/CPU models, memory, or clock speeds used for the experiments.
Software Dependencies No The paper mentions 'Optimizer Adam (Kingma & Ba, 2014)' in Tables 6 and 7, but does not provide specific version numbers for software dependencies such as the Adam implementation or the deep learning framework used (e.g., PyTorch, TensorFlow).
Experiment Setup Yes In Tables 6 and 7 we provide the full list of hyperparameters used in our implementations for DMC and Atari 100k, respectively.