reproducibilityindex.ai

DORA The Explorer: Directed Outreaching Reinforcement Action-Selection

Authors: Lior Fox, Leshem Choshen, Yonatan Loewenstein

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our approach to commonly used RL techniques, and show that using E-values improves learning and performance over traditional counters. We also show how our method can be implemented with function approximation to efﬁciently learn continuous MDPs. We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.
Researcher Affiliation	Academia	Leshem Choshen School of Computer Science and Engineering and Department of Cognitive Sciences The Hebrew University of Jerusalem leshem.choshen@mail.huji.ac.il Lior Fox The Edmond and Lily Safra Center for Brain Sciences The Hebrew University of Jerusalem lior.fox@mail.huji.ac.il Yonatan Loewenstein The Edmond and Lily Safra Center for Brain Sciences, Departments of Neurobiology and Cognitive Sciences and the Federmann Center for the Study of Rationality The Hebrew University of Jerusalem yonatan@huji.ac.il
Pseudocode	Yes	Algorithm 1: DORA algorithm using LLL determinization for stochastic policy f
Open Source Code	Yes	Supplementary code for this paper can be found at https://github.com/borgr/DORA/
Open Datasets	Yes	We demonstrate this by showing that our approach surpasses state of the art performance in the Freeway Atari 2600 game.
Dataset Splits	No	The paper describes using environments like the Bridge MDP, Mountain Car, and Freeway Atari 2600 game. While it mentions hyperparameter fitting, it does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) typically found in supervised learning, as is common for reinforcement learning environments.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions training a neural network.
Software Dependencies	No	The paper mentions using "an existing implementation for DQN and density-model counters available at https://github.com/brendanator/atari-rl." However, it does not list specific version numbers for any key software components or libraries (e.g., Python, TensorFlow, PyTorch versions) that would be needed for reproducibility.
Experiment Setup	Yes	Second, we trained the network while adding an exploration bonus of β log E to the reward (In all reported simulations, β = 0.05). In both cases, action-selection was performed by an ϵ-greedy rule, as in Bellemare et al. (2016).