Cell-Free Latent Go-Explore
Authors: Quentin Gallouédec, Emmanuel Dellandrea
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate LGE, we conducted experiments in the context of reward-free exploration in various hard-exploration environments including a maze, a robotic arm interacting with an object, and two Atari games known for their high exploration difficulty: Montezuma s Revenge and Pitfall. We show in Section 4.4 that for the environments studied, LGE outperforms all state-of-the-art algorithms studied in this paper, and in particular Go-Explore for the exploration task. We present statistically robust empirical results conducted on diverse environments, including robotic systems and Atari games, that demonstrate our approach s significant improvement in exploration performance. |
| Researcher Affiliation | Academia | 1Univ Lyon, Centrale Lyon, CNRS, INSA Lyon, UCBL, LIRIS, UMR5205, F-69130 Ecully, France. Correspondence to: Quentin Gallou edec <quentin.gallouedec@ec-lyon.fr>. |
| Pseudocode | Yes | The pseudo-code of the resulting algorithm is presented in Algorithm 1. |
| Open Source Code | Yes | The LGE implementation is available as open-source at https://github.com/qgallouedec/ lge. |
| Open Datasets | Yes | We simulate a Franka robot under the Py Bullet physics engine using panda-gym (Gallou edec et al., 2021). Atari We train LGE on two high-dimensional Atari 2600 environments simulated through the Arcade Learning Environment (ALE, Bellemare et al. (2013)). |
| Dataset Splits | No | The paper mentions hyperparameter optimization and training with multiple seeds but does not specify explicit train/validation/test dataset splits with percentages or counts. |
| Hardware Specification | Yes | In terms of infrastructure, each run was performed on a single worker machine equipped with one CPU and one NVIDIA V100 GPU + 120 Gb of RAM. |
| Software Dependencies | Yes | To nullify the variation in results due to different implementations, we implement all algorithms in the same framework: Stable-Baselines3 (Raffin et al., 2021). For Atari, we use QR-DQN (Dabney et al., 2018). We use a VQVAE (van den Oord et al., 2017). We simulate a Franka robot under the Py Bullet physics engine using panda-gym (Gallou edec et al., 2021). |
| Experiment Setup | Yes | The hyperparameters for this algorithm are identical. For the maze environment, we use SAC, while for the robotic environment, we use DDPG as it gives better results for all methods. For Atari environments, we use QR-DQN (Dabney et al., 2018). Appendix A details the optimization process and the resulting hyperparameters. Table 1, Table 2, and Table 3 detail hyperparameter search spaces and values for LGE, Go-Explore, ICM, Surprise, DIAYN, Skew-Fit, SAC, DDPG, and QR-DQN. |