reproducibilityindex.ai

Learning Pessimism for Reinforcement Learning

Authors: Edoardo Cetin, Oya Celiktutan

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the effectiveness of GPL, we integrate it with two popular off-policy RL algorithms. ... We show that GPL significantly improves the performance and robustness of off-policy RL, concretely surpassing prior algorithms and setting new state-of-the-art results. In our evaluation, we repeat each experiment with five random seeds and record both mean and standard deviation over the episodic returns. Moreover, we validate statistical significance using tools from Rliable (Agarwal et al. 2021). In the extended version (Cetin and Celiktutan 2021), we report all details of our experimental settings and utilized hyper-parameters. We also provide comprehensive extended results analyzing the impact of all relevant design choices, testing several alternative implementations, and reporting all training times.
Researcher Affiliation	Academia	Edoardo Cetin1, Oya Celiktutan1 1 King s College London edoardo.cetin@kcl.ac.uk, oya.celiktutan@kcl.ac.uk
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We share our code to facilitate future extensions.
Open Datasets	Yes	On challenging Mujoco tasks from Open AI Gym (Todorov, Erez, and Tassa 2012; Brockman et al. 2016), GPL-SAC outperforms both model-based (Janner et al. 2019) and model-free (Chen et al. 2021) state-of-the-art algorithms, while being more computationally efficient. Additionally, on pixel-based environments from the Deep Mind Control Suite (Tassa et al. 2018), GPL-Dr Q provides significant performance improvements from the recent state-of-the-art Dr Qv2 algorithm.
Dataset Splits	No	The paper describes evaluation procedures ('We collect the returns over five evaluation episodes every 1000 environment steps', 'For each run, we average the returns from 100 evaluation episodes') and repetitions with random seeds, but it does not specify explicit training, validation, or test dataset splits in terms of data samples or percentages.
Hardware Specification	No	The paper mentions evaluating the algorithm 'under the same hardware' but does not provide specific details about the hardware used (e.g., GPU model, CPU model, memory).
Software Dependencies	No	The paper mentions the use of popular RL algorithms (SAC, Dr Q) and environments (OpenAI Gym, Deep Mind Control Suite), along with a tool for statistical significance (Rliable), but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	Specifically, we only substitute SAC s clipped double Q-learning with our uncertainty regularizer, initialized with β = 0.5. Inline with the other considered state-of-the-art baselines (Chen et al. 2021; Janner et al. 2019), we use an increased ensemble size and update-to-data (UTD) ratio for the critic.