reproducibilityindex.ai

Principled Exploration via Optimistic Bootstrapping and Backward Induction

Authors: Chenjia Bai, Lingxiao Wang, Lei Han, Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments in the MNIST maze and Atari suite suggest that OB2I outperforms several state-of-the-art exploration approaches.We evaluate OB2I empirically by solving MNIST maze and 49 Atari games.
Researcher Affiliation	Collaboration	1Harbin Institute of Technology, Harbin, China 2Northwestern University, Evanston, USA 3Tencent Robotics X 4Tianjin University 5University of Toronto, Vector Institute.
Pseudocode	Yes	Algorithm 1 LSVI-UCB in linear MDP
Open Source Code	Yes	The code is available at https://github.com/Baichenjia/OB2I.
Open Datasets	Yes	We evaluate the algorithms in high-dimensional image-based tasks, including MNIST Maze (Lee et al., 2019) and 49 Atari games.
Dataset Splits	No	The paper discusses training frames and evaluation for RL environments, but does not specify dataset splits (e.g., percentages or counts) for a separate validation set, which is typical for static dataset-based experiments.
Hardware Specification	Yes	BEBU, BEBU-UCB, BEBU-IDS and OB2I are trained for 20M frames with RTX-2080Ti GPU for 5 random seeds.
Software Dependencies	No	The paper mentions software concepts like 'Deep Reinforcement Learning' and 'Bootstrapped DQN', but does not provide specific version numbers for any software libraries, frameworks, or languages used in the experiments.
Experiment Setup	Yes	For OB2I, we set both α1 and α2 as 0.5 10 4 by tuning over ﬁve popular tasks, including Breakout, Freeway, Qbert, Seaquest, and Space Invaders. Generally, small α1 and α2 yield better performance empirically since the bonus accumulates along the episode that usually contains thousands of steps in Atari. We use diffusion factor β = 0.5 for all methods by following Lee et al. (2019).