Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error
Authors: Haoran Li, Zicheng Zhang, Wang Luo, Congying Han, Yudong Hu, Tiande Guo, Shichen Liao
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive comparison and ablation experiments to validate the rationality of our theoretical analysis and the effectiveness of CAR-DQN. |
| Researcher Affiliation | Academia | 1School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China. Correspondence to: Congying Han <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Consistent Adversarial Robust Deep Q-Learning (CAR-DQN). |
| Open Source Code | Yes | Our code is available at https://github.com/leoranlmia/CAR-DQN. |
| Open Datasets | Yes | We conduct experiments on four Atari video games (Brockman et al., 2016), including Pong, Freeway, Bank Heist, and Road Runner. |
| Dataset Splits | No | The paper describes data preprocessing (e.g., 'pre-process the input images into 84 84 grayscale images and normalize the pixel values to the range [0, 1]'), but does not explicitly state training, validation, or test dataset splits. |
| Hardware Specification | No | The paper states 'All these models are trained for 4.5 million frames on identical hardware' but does not provide specific details such as GPU/CPU models, memory, or other hardware specifications. |
| Software Dependencies | No | The paper mentions software components like 'Double Dueling DQN' and 'Adam optimizer', but does not provide specific version numbers for any software, libraries, or frameworks used. |
| Experiment Setup | Yes | We implement CAR-DQN based on Double Dueling DQN (Van Hasselt et al., 2016; Wang et al., 2016) and train all baselines and CAR-DQN for 4.5 million steps... We update the target network every 2000 steps, and set learning rate as 1.25 10 4, batch size as 32, exploration ϵexp-end as 0.01, soft coefficient λ = 1.0 and discount factor as 0.99. We use a replay buffer with a capacity of 2 105 and Adam optimizer (Kingma & Ba, 2014) with β1 = 0.9 and β2 = 0.999. |