Improving Intrinsic Exploration by Creating Stationary Objectives

Authors: Roger Creus Castanyer, Joshua Romoff, Glen Berseth

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS SOFE is designed to improve the performance of exploration tasks. To evaluate its efficacy, we study three questions: (1) How much does SOFE facilitate the optimization of non-stationary exploration bonuses? (2) Does this increased stationarity improve exploration for downstream tasks? (3) How well does SOFE scale to image-based state inputs where approximations are needed to estimate state-visitation frequencies? To answer each of these research questions, we run the experiments as follows.
Researcher Affiliation Collaboration Roger Creus Castanyer Mila Qu ebec AI Institute Universit e de Montr eal Joshua Romoff Ubisoft La Forge joshua.romoff@ubisoft.com Glen Berseth Mila Qu ebec AI Institute Universit e de Montr eal {roger.creus-castanyer, glen.berseth}@mila.quebec
Pseudocode Yes A.7 STATE-ENTROPY MAXIMIZATION In this section, we provide the pseudo-code for the surprise-maximization algorithm presented in Section 3.1.3. ... Algorithm 1 Surprise Maximization
Open Source Code No Videos of the trained agents and summarized findings can be found on our supplementary webpage1.
Open Datasets Yes Deep Sea sparse-reward hard-exploration task from the Deep Mind suite (Osband et al., 2019); Mini Hack-Multi Room-N6-v0 task, originally used for E3B in Henaff et al. (2023); Procgen-Maze task (Cobbe et al., 2020); Habitat environment (Szot et al., 2021); HM3D dataset (Ramakrishnan et al., 2021)
Dataset Splits No No explicit statement of training, validation, and test dataset splits with percentages or counts was found. The paper focuses on experimental setups within reinforcement learning environments.
Hardware Specification No We optimize the E3B exploration bonus with PPO (Schulman et al., 2017) which requires 31 hours in a machine with a single GPU.
Software Dependencies No We use Stable-Baselines3 (Raffin et al., 2021) to run our experiments in the mazes, Godot maps, and Deep Sea.
Experiment Setup Yes A.3 TRAINING DETAILS; Table 2: Hyperparameters for the DQN Implementation; Table 3: Hyperparameters for the PPO Implementation; Table 4: Hyperparameters for the A2C Implementation