Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning
Authors: Sébastien Forestier, Rémy Portelas, Yoan Mollard, Pierre-Yves Oudeyer
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present several implementations of this architecture and demonstrate their ability to automatically generate a learning curriculum within several experimental setups. One of these experiments includes a real humanoid robot exploring multiple spaces of goals with several hundred continuous dimensions and with distractors. We present a systematic experimental study of this new IMGEP algorithm in diverse environments providing opportunities for discovering complex skills like tool use, as well as including complex distractors: a 2D simulated environment, a Minecraft environment, and a real humanoid robotic setup. |
| Researcher Affiliation | Academia | Sébastien Forestier EMAIL Inria Bordeaux Sud-Ouest 200 avenue de la Vieille Tour, 33405 Talence, France Rémy Portelas EMAIL Inria Bordeaux Sud-Ouest 200 avenue de la Vieille Tour, 33405 Talence, France Yoan Mollard EMAIL Inria Bordeaux Sud-Ouest 200 avenue de la Vieille Tour, 33405 Talence, France Pierre-Yves Oudeyer EMAIL Inria Bordeaux Sud-Ouest 200 avenue de la Vieille Tour, 33405 Talence, France |
| Pseudocode | Yes | Architecture 1 Intrinsically Motivated Goal Exploration Process (IMGEP) Require: Action space A, State space S 1: Initialize knowledge E H 2: Initialize goal space G and goal policy Γ 3: Initialize policies Π and Πϵ 4: Launch asynchronously the two following loops: 5: loop Ź Exploration loop 6: Choose goal g in G with Γ 7: Execute a roll-out of Πϵ, observe trajectory τ Ź From now on fg1pτq can be computed to estimate the fitness of the current experiment τ for achieving any goal g1 P G 8: Compute the fitness f fgpτq associated to goal g 9: Compute intrinsic reward ri IRp E, g, fq associated to g 10: Update exploration policy Πϵ with (E, g, τ, f) Ź e.g. fast incremental algo. 11: Update goal policy Γ with (E, g, τ, f, ri) 12: Update knowledge E with (g, τ, f, ri) 13: loop Ź Exploitation loop 14: Update policy Π with E Ź e.g. batch training of deep NN, SVMs, GMMs 15: Update goal space G with E 16: return Π |
| Open Source Code | Yes | The code of the different environments and experiments is available on Git Hub1. 1. Code of the IMGEP experiments: https://github.com/sebastien-forestier/IMGEP |
| Open Datasets | No | The paper primarily focuses on custom-built simulated and real-world environments for experimentation, rather than utilizing pre-existing, publicly available datasets. While the code for these environments is provided, no specific public datasets are made available or explicitly linked. |
| Dataset Splits | No | The paper describes intrinsically motivated goal exploration processes in dynamic environments where data is generated through continuous interaction. The methodology does not involve splitting a static dataset into training, validation, or test sets in the conventional sense, as experiments are conducted over a number of 'iterations' or 'runs' in these environments. |
| Hardware Specification | Yes | A Poppy Torso robot (the learning agent) is mounted in front of two joysticks and explores with its left arm. A Poppy Ergo robot (seen as a robotic toy) is controlled by the right joystick and can push a ball that controls some lights and sounds. Poppy is a robust and accessible open-source 3D printed robotic platform (Lapeyre et al., 2014). The population-based IMGEP implementation in its simplest form, with a Nearest Neighbor look-up used as inverse models, is also computationally efficient, as we have run the 20k iterations of the real robotic experiment on a raspberry Pi 3. |
| Software Dependencies | No | The paper describes algorithmic implementations and theoretical concepts like neural networks, Gaussian basis functions, and kd-trees, but does not specify particular software libraries or frameworks with their version numbers (e.g., Python, PyTorch, TensorFlow, scikit-learn versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | In the 2D Simulated environment and the Robotic environment, we implement the motor policies with Radial Basis Functions (RBF). We define 5 Gaussian basis functions with the same shape (σ 5 for a 50 steps trajectory in the 2D environment and σ 3 for 30 steps in the Robotic environment) and with equally spaced centers (see Fig. 6). The movement of each joint is the result of a weighted sum of the product of 5 parameters and the 5 basis. The total vector θ has 20 parameters, in both the 2D Simulated and the Robotic environment. SSPMutation adds a Gaussian noise around those values of θ in the 2D simulated environment (σ 0.05) and in Minecraft Mountain Cart (σ 0.3), or adds the Gaussian noise around the previous motor positions (in the robotic environment with joysticks). In 80% of the iterations, the agent uses Πϵpθ | g, cq to generate with exploration a policy θ and does not update its progress estimation. In the other 20%, it uses Π, without exploration, to generate θ and updates its learning progress estimation in Gk, with the estimated progress in reaching g. Finally, Γ implements a non-stationary bandit algorithm to sample goal spaces. The bandit keeps track of a running average rk i of the intrinsic rewards ri associated to the current goal space Gk. With probability 20%, it samples a random space Gk, and with probability 80%, the probability to sample Gk is proportional to rk i in the 2D Simulated and Minecraft environments, or k rk i if rk i ą 0 and 0 otherwise, in the Robotic environment. |