Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
Authors: Théo Vincent, Daniel Palenicek, Boris Belousov, Jan Peters, Carlo D'Eramo
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the advantages of i-QN in Atari 2600 games and Mu Jo Co continuous control problems. Our code is publicly available at https: // github. com/ theovincent/ i-DQN and the trained models are uploaded at https: // huggingface. co/ Theo Vincent/ Atari_ i-QN . |
| Researcher Affiliation | Academia | Théo Vincent EMAIL DFKI, SAIROL Team & TU Darmstadt Daniel Palenicek EMAIL TU Darmstadt & Hessian.ai Boris Belousov EMAIL DFKI, SAIROL Team Jan Peters EMAIL DFKI, SAIROL Team & TU Darmstadt & Hessian.ai Carlo D Eramo EMAIL University of Würzburg & TU Darmstadt & Hessian.ai |
| Pseudocode | Yes | Algorithm 1 Iterated Deep Q-Network (i-DQN). Modifications to DQN are marked in purple. [...] Algorithm 2 Iterated Soft Actor-Critic (i-SAC). Modifications to SAC are marked in purple. |
| Open Source Code | Yes | Our code is publicly available at https: // github. com/ theovincent/ i-DQN and the trained models are uploaded at https: // huggingface. co/ Theo Vincent/ Atari_ i-QN . |
| Open Datasets | Yes | We empirically demonstrate the advantages of i-QN in Atari 2600 games and Mu Jo Co continuous control problems. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It describes data collection for a specific environment (car-on-hill) with 50,000 samples and batch sizes, but not how these samples are partitioned into distinct training, validation, and test sets for evaluation in a traditional supervised learning sense. For Atari and MuJoCo, the approach is based on continuous interaction with environments and learning from a replay buffer rather than fixed dataset splits. |
| Hardware Specification | Yes | Computations are made on an NVIDIA Ge Force RTX 4090 Ti. |
| Software Dependencies | No | The paper mentions software like JAX and the Adam optimizer, but does not provide specific version numbers for these or other software libraries. |
| Experiment Setup | Yes | Table 3: Summary of all hyperparameters used for the Atari experiments. We note Convd a,b C a 2D convolutional layer with C filters of size a b and of stride d, and FC E a fully connected layer with E neurons. [...] Table 4: Summary of all hyperparameters used for the Mu Jo Co experiments. We note FC E a fully connected layer with E neurons. |