Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Graying the black box: Understanding DQNs
Authors: Tom Zahavy, Nir Ben-Zrihem, Shie Mannor
ICML 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We applied our methodology on three ATARI games: Breakout, Pacman and Seaquest. For each one we give a short description of the game, analyze the optimal policy, detail the features we designed, interpret the DQN s policy and derive conclusions. |
| Researcher Affiliation | Academia | Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, Israel |
| Pseudocode | No | The paper describes methods and steps, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing open-source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Its success was demonstrated in the Arcade Learning Environment (ALE) (Bellemare et al., 2012), a challenging framework composed of dozens of Atari games used to evaluate general competency in AI. |
| Dataset Splits | No | The paper describes the training process and the use of experience replay, but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud resources used for running the experiments. |
| Software Dependencies | No | The paper mentions software tools like t-SNE and Mayavi but does not provide specific version numbers for these or any other software dependencies (e.g., programming languages, deep learning frameworks, or libraries) used for the experiments. |
| Experiment Setup | Yes | The reward rt is clipped to the range of [ 1, 1] to guarantee stability when training DQNs over multiple domains with different reward scales. The DQN algorithm maintains two separate Q-networks: one with parameters θ, and a second with parameters θtarget that are updated from θ every fixed number of iterations. In order to capture the game dynamics, the DQN algorithm represents a state by a sequence of history frames and pads initial states with zero frames. |