Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploration by Learning Diverse Skills through Successor State Representations
Authors: Paul-Antoine LE TOLGUENEC, Yann BESSE, Florent Teichteil-Koenigsbuch, Dennis Wilson, Emmanuel Rachelson
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our approach on a set of maze navigation and robotic control tasks which show that our method is capable of constructing a diverse set of skills which exhaustively cover the state space without relying on reward or exploration bonuses. |
| Researcher Affiliation | Collaboration | Paul-Antoine Le Tolguenec ISAE-Supaero, Airbus EMAIL Yann Besse Airbus EMAIL Florent Teichteil-Konigsbuch Airbus EMAIL Dennis G. Wilson ISAE-Supaero, Université de Toulouse EMAIL Emmanuel Rachelson ISAE-Supaero, Université de Toulouse EMAIL |
| Pseudocode | Yes | Algorithm 1 LEADS Initialize θ0 for t [0, N] do # Collect samples Dz = , z Z for e [1, nep] do Sample skill z p(z) {(st, at, rt, s t)} = episode with πθt( , z) from s0 Dz = Dz {(st, at, rt, s t)} end for # Learn the SSR Learn mϕt for πθt using on-policy C-learning Sample s δ(s|z) # Improve θ for i [1, n SGD] do Sample z p(z), s1 p(s|z) θ θ + α θ[G(θ) + λh H(θ)] Update ϕt using off-policy C-learning end for end for |
| Open Source Code | Yes | We provide all code for LEADS and the baseline algorithms, as well as the scripts to reproduce the experiments (repository). |
| Open Datasets | Yes | We evaluate LEADS on a variety of Mu Jo Co [42] environments from different benchmark suites. Fetch-Reach [37] is a 7-Do F (degrees of freedom) robotic arm equipped with a two-fingered parallel gripper; its observation space is 10-dimensional. Fetch-Slide extends the former with a puck placed on a platform in front of the arm, increasing the observation space dimension to 25. Hand [37] is a 24-Do F anthropomorphic robotic hand, with a 63-dimensional observation space. Finger [44] a 3-Do F, 12-dimensional observation space, manipulation environment where a planar finger is required to rotate an object on an unactuated hinge. |
| Dataset Splits | No | The paper describes training on various reinforcement learning environments and evaluates performance across tasks, but does not specify explicit train/validation/test dataset splits in terms of percentages, counts, or predefined partition files. |
| Hardware Specification | No | This work was performed using HPC resources from CALMIP (Grant 2016-[p21001]). |
| Software Dependencies | No | The paper mentions software like Mu Jo Co [42] and Gymnasium suite [43], but it does not provide specific version numbers for these or any other key software dependencies required to replicate the experiment. |
| Experiment Setup | Yes | The following table (Table 3) summarizes the hyperparameters used in our experimental setup. Hyperparameter Value nskill 6 zdim 20 λh 0.05 γ 0.95 λc-learning 0.5 αθ 5 10 4 αc-learning 5 10 4 nepisode 16 n SGD, c-learning 256 n SGD, actor 16 narchive 1 batch sizec-learning 1024 batch sizeloss 1024 Table 3: Hyperparameters used for LEADS |