Graying the black box: Understanding DQNs

Authors: Tom Zahavy, Nir Ben-Zrihem, Shie Mannor

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We applied our methodology on three ATARI games: Breakout, Pacman and Seaquest. For each one we give a short description of the game, analyze the optimal policy, detail the features we designed, interpret the DQN s policy and derive conclusions.
Researcher Affiliation Academia Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, Israel
Pseudocode No The paper describes methods and steps, but does not present any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about releasing open-source code or provide a link to a code repository for the methodology described.
Open Datasets Yes Its success was demonstrated in the Arcade Learning Environment (ALE) (Bellemare et al., 2012), a challenging framework composed of dozens of Atari games used to evaluate general competency in AI.
Dataset Splits No The paper describes the training process and the use of experience replay, but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud resources used for running the experiments.
Software Dependencies No The paper mentions software tools like t-SNE and Mayavi but does not provide specific version numbers for these or any other software dependencies (e.g., programming languages, deep learning frameworks, or libraries) used for the experiments.
Experiment Setup Yes The reward rt is clipped to the range of [ 1, 1] to guarantee stability when training DQNs over multiple domains with different reward scales. The DQN algorithm maintains two separate Q-networks: one with parameters θ, and a second with parameters θtarget that are updated from θ every fixed number of iterations. In order to capture the game dynamics, the DQN algorithm represents a state by a sequence of history frames and pads initial states with zero frames.