Graying the black box: Understanding DQNs
Authors: Tom Zahavy, Nir Ben-Zrihem, Shie Mannor
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We applied our methodology on three ATARI games: Breakout, Pacman and Seaquest. For each one we give a short description of the game, analyze the optimal policy, detail the features we designed, interpret the DQN s policy and derive conclusions. |
| Researcher Affiliation | Academia | Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, Israel |
| Pseudocode | No | The paper describes methods and steps, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about releasing open-source code or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Its success was demonstrated in the Arcade Learning Environment (ALE) (Bellemare et al., 2012), a challenging framework composed of dozens of Atari games used to evaluate general competency in AI. |
| Dataset Splits | No | The paper describes the training process and the use of experience replay, but does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud resources used for running the experiments. |
| Software Dependencies | No | The paper mentions software tools like t-SNE and Mayavi but does not provide specific version numbers for these or any other software dependencies (e.g., programming languages, deep learning frameworks, or libraries) used for the experiments. |
| Experiment Setup | Yes | The reward rt is clipped to the range of [ 1, 1] to guarantee stability when training DQNs over multiple domains with different reward scales. The DQN algorithm maintains two separate Q-networks: one with parameters θ, and a second with parameters θtarget that are updated from θ every fixed number of iterations. In order to capture the game dynamics, the DQN algorithm represents a state by a sequence of history frames and pads initial states with zero frames. |