DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

Authors: Lior Fox, Leshem Choshen, Yonatan Loewenstein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our approach to commonly used RL techniques, and show that using E-values improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to efficiently learn continuous MDPs. We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.
Researcher Affiliation Academia Leshem Choshen School of Computer Science and Engineering and Department of Cognitive Sciences The Hebrew University of Jerusalem leshem.choshen@mail.huji.ac.il Lior Fox The Edmond and Lily Safra Center for Brain Sciences The Hebrew University of Jerusalem lior.fox@mail.huji.ac.il Yonatan Loewenstein The Edmond and Lily Safra Center for Brain Sciences, Departments of Neurobiology and Cognitive Sciences and the Federmann Center for the Study of Rationality The Hebrew University of Jerusalem yonatan@huji.ac.il
Pseudocode Yes Algorithm 1: DORA algorithm using LLL determinization for stochastic policy f
Open Source Code Yes Supplementary code for this paper can be found at https://github.com/borgr/DORA/
Open Datasets Yes We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.
Dataset Splits No The paper describes using environments like the Bridge MDP, Mountain Car, and Freeway Atari 2600 game. While it mentions hyperparameter fitting, it does not provide specific train/validation/test *dataset* splits (e.g., percentages or sample counts) typically found in supervised learning, as is common for reinforcement learning environments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions training a neural network.
Software Dependencies No The paper mentions using "an existing implementation for DQN and density-model counters available at https://github.com/brendanator/atari-rl." However, it does not list specific version numbers for any key software components or libraries (e.g., Python, TensorFlow, PyTorch versions) that would be needed for reproducibility.
Experiment Setup Yes Second, we trained the network while adding an exploration bonus of β log E to the reward (In all reported simulations, β = 0.05). In both cases, action-selection was performed by an ϵ-greedy rule, as in Bellemare et al. (2016).