Regularized Q-learning through Robust Averaging

Authors: Peter Schmitt-Förster, Tobias Sutter

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.
Researcher Affiliation Academia 1Department of Computer and Information Science, University of Konstanz, Germany. Correspondence to: Peter Schmitt-F orster <peter.schmitt-foerster@uni-konstanz.de>.
Pseudocode No The paper describes the update rules mathematically (e.g., equation (7)) but does not include a formal pseudocode or algorithm block.
Open Source Code Yes Here: github.com/2RAQ/code
Open Datasets Yes Lastly, we conduct numerical experiments for various settings... In more practical experiments from the Open AI gym suite (Brockman et al., 2016) we show that, even when implementations require deviations from out theoretically required assumptions, 2RA Q-learning has good performance and mostly outperforms other Q-learning variants.
Dataset Splits No The paper mentions training episodes and evaluation, but does not provide explicit training, validation, or test dataset splits. For reinforcement learning environments, data is typically generated through interaction rather than being split from a fixed dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like "Tensorflow (Abadi et al., 2015)", "Huber loss (Huber, 1964)", and "Adam optimizer (Kingma & Ba, 2015)", but it does not specify version numbers for these software components.
Experiment Setup Yes All Methods use an initial learning rate of α0 = 0.01, wα = 10^5, and γ = 0.8. All 2RA agents additionally use wρ = 10^3. The reward function has values random-uniformly sampled from [−0.05, 0.05].