reproducibilityindex.ai

Regularized Q-learning through Robust Averaging

Authors: Peter Schmitt-Förster, Tobias Sutter

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.
Researcher Affiliation	Academia	1Department of Computer and Information Science, University of Konstanz, Germany. Correspondence to: Peter Schmitt-F orster <peter.schmitt-foerster@uni-konstanz.de>.
Pseudocode	No	The paper describes the update rules mathematically (e.g., equation (7)) but does not include a formal pseudocode or algorithm block.
Open Source Code	Yes	Here: github.com/2RAQ/code
Open Datasets	Yes	Lastly, we conduct numerical experiments for various settings... In more practical experiments from the Open AI gym suite (Brockman et al., 2016) we show that, even when implementations require deviations from out theoretically required assumptions, 2RA Q-learning has good performance and mostly outperforms other Q-learning variants.
Dataset Splits	No	The paper mentions training episodes and evaluation, but does not provide explicit training, validation, or test dataset splits. For reinforcement learning environments, data is typically generated through interaction rather than being split from a fixed dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software like "Tensorflow (Abadi et al., 2015)", "Huber loss (Huber, 1964)", and "Adam optimizer (Kingma & Ba, 2015)", but it does not specify version numbers for these software components.
Experiment Setup	Yes	All Methods use an initial learning rate of α0 = 0.01, wα = 10^5, and γ = 0.8. All 2RA agents additionally use wρ = 10^3. The reward function has values random-uniformly sampled from [−0.05, 0.05].