reproducibilityindex.ai

Improving Intrinsic Exploration by Creating Stationary Objectives

Authors: Roger Creus Castanyer, Joshua Romoff, Glen Berseth

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS SOFE is designed to improve the performance of exploration tasks. To evaluate its efficacy, we study three questions: (1) How much does SOFE facilitate the optimization of non-stationary exploration bonuses? (2) Does this increased stationarity improve exploration for downstream tasks? (3) How well does SOFE scale to image-based state inputs where approximations are needed to estimate state-visitation frequencies? To answer each of these research questions, we run the experiments as follows.
Researcher Affiliation	Collaboration	Roger Creus Castanyer Mila Qu ebec AI Institute Universit e de Montr eal Joshua Romoff Ubisoft La Forge joshua.romoff@ubisoft.com Glen Berseth Mila Qu ebec AI Institute Universit e de Montr eal {roger.creus-castanyer, glen.berseth}@mila.quebec
Pseudocode	Yes	A.7 STATE-ENTROPY MAXIMIZATION In this section, we provide the pseudo-code for the surprise-maximization algorithm presented in Section 3.1.3. ... Algorithm 1 Surprise Maximization
Open Source Code	No	Videos of the trained agents and summarized findings can be found on our supplementary webpage1.
Open Datasets	Yes	Deep Sea sparse-reward hard-exploration task from the Deep Mind suite (Osband et al., 2019); Mini Hack-Multi Room-N6-v0 task, originally used for E3B in Henaff et al. (2023); Procgen-Maze task (Cobbe et al., 2020); Habitat environment (Szot et al., 2021); HM3D dataset (Ramakrishnan et al., 2021)
Dataset Splits	No	No explicit statement of training, validation, and test dataset splits with percentages or counts was found. The paper focuses on experimental setups within reinforcement learning environments.
Hardware Specification	No	We optimize the E3B exploration bonus with PPO (Schulman et al., 2017) which requires 31 hours in a machine with a single GPU.
Software Dependencies	No	We use Stable-Baselines3 (Raffin et al., 2021) to run our experiments in the mazes, Godot maps, and Deep Sea.
Experiment Setup	Yes	A.3 TRAINING DETAILS; Table 2: Hyperparameters for the DQN Implementation; Table 3: Hyperparameters for the PPO Implementation; Table 4: Hyperparameters for the A2C Implementation