Curiosity-Driven Exploration via Latent Bayesian Surprise

Authors: Pietro Mazzaglia, Ozan Catal, Tim Verbelen, Bart Dhoedt7752-7760

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our method by measuring the agent s performance in terms of environment exploration, for continuous tasks, and looking at the game scores achieved, for video games. Our model is computationally cheap and compares positively with current state-of-the-art methods on several problems. We also investigate the effects caused by stochasticity in the environment, which is often a failure case for curiosity-driven agents. In this regime, the results suggest that our approach is resilient to stochastic transitions. Experiments The aim of the experiments is to compare the performance of the LBS model and intrinsic rewards against other approaches for exploration in RL. Main results are presented with respect to three sets of environments: continuous control tasks, discrete-action games, and tasks with stochastic transitions. The continuous control tasks include the classic Mountain Car environment (Moore 1990), the Mujoco-based Half Cheetah environment (Todorov, Erez, and Tassa 2012), and the Ant Maze environment used in (Shyam, Ja skowski, and Gomez 2019). The discrete-action games include 8 video games from the Atari Learning Environment (ALE; Bellemare et al. (2013)) and the Super Mario Bros. game, which is a popular NES platform game. The stochastic tasks include an image-prediction task with stochastic dynamics and two stochastic variants of Mountain Car, including a Noisy TVlike component. In this Section, we consider curious agents that only optimize their self-supervised signal for exploration.
Researcher Affiliation Academia IDLab, Ghent University pietro.mazzaglia@ugent.be, ozan.catal@ugent.be, tim.verbelen@ugent.be, bart.dhoedt@ugent.be
Pseudocode No The paper does not include a section or figure explicitly labeled "Pseudocode" or "Algorithm", nor does it present its method in a structured, code-like format.
Open Source Code No Further visualization, is available on the project webpage.1 https://lbsexploration.github.io/ (Note: The linked webpage states "The code will be made public soon!", indicating it's not yet openly available.)
Open Datasets Yes The continuous control tasks include the classic Mountain Car environment (Moore 1990), the Mujoco-based Half Cheetah environment (Todorov, Erez, and Tassa 2012), and the Ant Maze environment used in (Shyam, Ja skowski, and Gomez 2019). The discrete-action games include 8 video games from the Atari Learning Environment (ALE; Bellemare et al. (2013)) and the Super Mario Bros. game, which is a popular NES platform game. Similarly to (Pathak, Gandhi, and Gupta 2019), we employ the Noisy MNIST dataset (Le Cun et al. 1995) to perform an experiment on stochastic transitions.
Dataset Splits No The paper does not explicitly specify dataset splits (e.g., percentages or counts) for training, validation, and testing sets needed for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions using the "Proximal Policy Optimization algorithm (PPO; Schulman et al. (2017))" and "neural networks" but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For all tasks, we update the policy using the Proximal Policy Optimization algorithm (PPO; Schulman et al. (2017)). For all model s components, we use neural networks. For the model s latent stochastic variable, we use distributional layers implemented as linear layers that output the means and standard deviations of a multivariate gaussian. This means that we omit any external rewards, by setting ηe = 0 (see Background). We train the models uniformly sampling random transitions in batches of 128 samples and run the experiments with ten random seeds.