Latent exploration for Reinforcement Learning
Authors: Alberto Silvio Chiappa, Alessandro Marin Vargas, Ann Huang, Alexander Mathis
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With extensive experiments, we show that Lattice can replace standard unstructured exploration [2, 5] and time-only-correlated exploration (g SDE) [8] in off-policy (SAC) and on-policy (PPO) RL algorithms, and improve performance in complex motor control tasks. Importantly, we demonstrate that Lattice-SAC is competitive in standard benchmarks for continuous control, such as the locomotion environments of Py Bullet [20]. We benchmarked Lattice on standard locomotion tasks [47, 6, 16, 48 50] in Py Bullet [20], as well as musculoskeletal control tasks of Myo Suite [18] built in Mu Jo Co [31]. All the results are averaged across 5 random seeds. |
| Researcher Affiliation | Academia | Alberto Silvio Chiappa École Polytechnique Fédérale de Lausanne (EPFL) alberto.chiappa@epfl.ch Alessandro Marin Vargas EPFL alessandro.marinvargas@epfl.ch Ann Zixiang Huang Mila, EPFL zixiang.huang@mail.mcgill.ca Alexander Mathis EPFL alexander.mathis@epfl.ch |
| Pseudocode | Yes | Algorithm 1 Standard (e.g., PPO, SAC) and Algorithm 2 Lattice are presented on page 4. |
| Open Source Code | Yes | The code is available at: https://github.com/amathislab/lattice. |
| Open Datasets | Yes | We benchmarked Lattice on standard locomotion tasks [47, 6, 16, 48 50] in Py Bullet [20], as well as musculoskeletal control tasks of Myo Suite [18] built in Mu Jo Co [31]. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with percentages or sample counts. It refers to training in environments and averaging results over random seeds, which is common in RL, but not explicit dataset splits. |
| Hardware Specification | No | The training was run on a GPU cluster, for a total of approximately 10,000 GPU-hours. No specific GPU models, CPU types, or detailed cluster specifications are provided beyond 'GPU cluster'. |
| Software Dependencies | No | We implemented Lattice as an extension of g SDE in the RL library Stable Baselines 3 [45]. While Stable Baselines 3 is mentioned, no specific version number is provided for it or any other software dependency. |
| Experiment Setup | Yes | We used the same network architecture and hyperparameters for SAC specified in [46] for all the environments (see Appendix A.7). Tables T3, T4, and T5 in Appendix A.7 provide detailed hyperparameters for SAC, PPO, g SDE, and Lattice for various tasks. |