reproducibilityindex.ai

Graying the black box: Understanding DQNs

Authors: Tom Zahavy, Nir Ben-Zrihem, Shie Mannor

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We applied our methodology on three ATARI games: Breakout, Pacman and Seaquest. For each one we give a short description of the game, analyze the optimal policy, detail the features we designed, interpret the DQN s policy and derive conclusions.
Researcher Affiliation	Academia	Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, Israel
Pseudocode	No	The paper describes methods and steps, but does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement about releasing open-source code or provide a link to a code repository for the methodology described.
Open Datasets	Yes	Its success was demonstrated in the Arcade Learning Environment (ALE) (Bellemare et al., 2012), a challenging framework composed of dozens of Atari games used to evaluate general competency in AI.
Dataset Splits	No	The paper describes the training process and the use of experience replay, but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud resources used for running the experiments.
Software Dependencies	No	The paper mentions software tools like t-SNE and Mayavi but does not provide specific version numbers for these or any other software dependencies (e.g., programming languages, deep learning frameworks, or libraries) used for the experiments.
Experiment Setup	Yes	The reward rt is clipped to the range of [ 1, 1] to guarantee stability when training DQNs over multiple domains with different reward scales. The DQN algorithm maintains two separate Q-networks: one with parameters θ, and a second with parameters θtarget that are updated from θ every ﬁxed number of iterations. In order to capture the game dynamics, the DQN algorithm represents a state by a sequence of history frames and pads initial states with zero frames.