Principled Exploration via Optimistic Bootstrapping and Backward Induction

Authors: Chenjia Bai, Lingxiao Wang, Lei Han, Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments in the MNIST maze and Atari suite suggest that OB2I outperforms several state-of-the-art exploration approaches.We evaluate OB2I empirically by solving MNIST maze and 49 Atari games.
Researcher Affiliation Collaboration 1Harbin Institute of Technology, Harbin, China 2Northwestern University, Evanston, USA 3Tencent Robotics X 4Tianjin University 5University of Toronto, Vector Institute.
Pseudocode Yes Algorithm 1 LSVI-UCB in linear MDP
Open Source Code Yes The code is available at https://github.com/Baichenjia/OB2I.
Open Datasets Yes We evaluate the algorithms in high-dimensional image-based tasks, including MNIST Maze (Lee et al., 2019) and 49 Atari games.
Dataset Splits No The paper discusses training frames and evaluation for RL environments, but does not specify dataset splits (e.g., percentages or counts) for a separate validation set, which is typical for static dataset-based experiments.
Hardware Specification Yes BEBU, BEBU-UCB, BEBU-IDS and OB2I are trained for 20M frames with RTX-2080Ti GPU for 5 random seeds.
Software Dependencies No The paper mentions software concepts like 'Deep Reinforcement Learning' and 'Bootstrapped DQN', but does not provide specific version numbers for any software libraries, frameworks, or languages used in the experiments.
Experiment Setup Yes For OB2I, we set both α1 and α2 as 0.5 10 4 by tuning over five popular tasks, including Breakout, Freeway, Qbert, Seaquest, and Space Invaders. Generally, small α1 and α2 yield better performance empirically since the bonus accumulates along the episode that usually contains thousands of steps in Atari. We use diffusion factor β = 0.5 for all methods by following Lee et al. (2019).