Exploration Conscious Reinforcement Learning Revisited

Authors: Lior Shani, Yonathan Efroni, Shie Mannor

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Building on the approaches, we apply simple changes in existing tabular and deep Reinforcement Learning algorithms and empirically demonstrate superior performance relatively to their non-exploration-conscious counterparts, both for discrete and continuous action spaces. In this section, we test the theory and algorithms 1 suggested in this work. In all experiments we used γ = 0.99. The tested DRL algorithms in this section (See Appendix B) are simple variations of DDQN (Van Hasselt et al., 2016) and DDPG (Lillicrap et al., 2015), without any parameter tuning, and based on Section 5. Table 1. Train and Test rewards for the Atari 2600 environment. Table 2. Train and Test rewards for the Mu Jo Co environment.
Researcher Affiliation Academia Lior Shani * 1 Yonathan Efroni * 1 Shie Mannor 1 1Department of Electrical Engineering, Technion, Haifa, Israel. Correspondence to: Lior Shani <shanlior@gmail.com>, Yonathan Efroni <jonathan.efroni@gmail.com>.
Pseudocode Yes Algorithm 1 Expected α-Q-Learning and Algorithm 2 Surrogate α-Q-Learning
Open Source Code Yes Implementation of the proposed algorithms can be found in https://github.com/shanlior/Exploration Conscious RL.
Open Datasets Yes We used five Atari 2600 games (5) from the ALE (Bellemare et al., 2013). We tested the Expected σ-DDPG (5) and Surrogate σDDPG (6) on continuous control tasks from the Mu Jo Co environment (Todorov et al., 2012).
Dataset Splits No The paper uses standard RL environments (Atari, MuJoCo) which involve continuous interaction rather than predefined static dataset splits for train/validation/test. It mentions 'train phase' and 'evaluation phase' but does not specify dataset percentages or sample counts for these phases in the traditional sense of a supervised learning dataset split.
Hardware Specification No The paper does not provide specific details on the hardware used, such as GPU models, CPU types, or memory specifications for running the experiments.
Software Dependencies No We used the same deep neural network as in DQN (Mnih et al., 2015), using the open AI Baselines implementation (Dhariwal et al., 2017), without any parameter tuning, except for the update equations. (No version numbers)
Experiment Setup Yes In all experiments we used γ = 0.99. We used α = ϵ = 0.01 in the train phase, and ϵ = 0.001 in the evaluation phase. We used the default hyper-parameters, and independent Gaussian noise with σ = 0.2, for all tasks and algorithms.