Cell-Free Latent Go-Explore

Authors: Quentin Gallouédec, Emmanuel Dellandrea

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate LGE, we conducted experiments in the context of reward-free exploration in various hard-exploration environments including a maze, a robotic arm interacting with an object, and two Atari games known for their high exploration difficulty: Montezuma s Revenge and Pitfall. We show in Section 4.4 that for the environments studied, LGE outperforms all state-of-the-art algorithms studied in this paper, and in particular Go-Explore for the exploration task. We present statistically robust empirical results conducted on diverse environments, including robotic systems and Atari games, that demonstrate our approach s significant improvement in exploration performance.
Researcher Affiliation Academia 1Univ Lyon, Centrale Lyon, CNRS, INSA Lyon, UCBL, LIRIS, UMR5205, F-69130 Ecully, France. Correspondence to: Quentin Gallou edec <quentin.gallouedec@ec-lyon.fr>.
Pseudocode Yes The pseudo-code of the resulting algorithm is presented in Algorithm 1.
Open Source Code Yes The LGE implementation is available as open-source at https://github.com/qgallouedec/ lge.
Open Datasets Yes We simulate a Franka robot under the Py Bullet physics engine using panda-gym (Gallou edec et al., 2021). Atari We train LGE on two high-dimensional Atari 2600 environments simulated through the Arcade Learning Environment (ALE, Bellemare et al. (2013)).
Dataset Splits No The paper mentions hyperparameter optimization and training with multiple seeds but does not specify explicit train/validation/test dataset splits with percentages or counts.
Hardware Specification Yes In terms of infrastructure, each run was performed on a single worker machine equipped with one CPU and one NVIDIA V100 GPU + 120 Gb of RAM.
Software Dependencies Yes To nullify the variation in results due to different implementations, we implement all algorithms in the same framework: Stable-Baselines3 (Raffin et al., 2021). For Atari, we use QR-DQN (Dabney et al., 2018). We use a VQVAE (van den Oord et al., 2017). We simulate a Franka robot under the Py Bullet physics engine using panda-gym (Gallou edec et al., 2021).
Experiment Setup Yes The hyperparameters for this algorithm are identical. For the maze environment, we use SAC, while for the robotic environment, we use DDPG as it gives better results for all methods. For Atari environments, we use QR-DQN (Dabney et al., 2018). Appendix A details the optimization process and the resulting hyperparameters. Table 1, Table 2, and Table 3 detail hyperparameter search spaces and values for LGE, Go-Explore, ICM, Surprise, DIAYN, Skew-Fit, SAC, DDPG, and QR-DQN.