Grounding Language to Autonomously-Acquired Skills via Goal Generation
Authors: Ahmed Akakzia, Cédric Colas, Pierre-Yves Oudeyer, Mohamed CHETOUANI, Olivier Sigaud
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental section investigates three questions: how does DECSTR perform in the three phases? How does it compare to end-to-end LC-RL approaches? Do we need intermediate representations to be semantic? |
| Researcher Affiliation | Academia | Ahmed Akakzia Sorbonne Universit e ahmed.akakzia@isir.upmc.fr C edric Colas Inria cedric.colas@inria.fr Pierre-Yves Oudeyer Inria Mohamed Chetouani Sorbonne Universit e Olivier Sigaud Sorbonne Universit e |
| Pseudocode | Yes | Algorithm 1 and 2 present the high-level pseudo-code of any algorithm following the LGB architecture for each of the three phases. |
| Open Source Code | Yes | Code and videos can be found at https://sites.google.com/view/decstr/. |
| Open Datasets | No | A training dataset is collected via interactions between a DECSTR agent trained in phase G B and a social partner. DECSTR generates semantic goals and pursues them. For each trajectory, the social partner provides a description d of one change in objects relations from the initial configuration ci to the final one cf. The set of possible descriptions contains 102 sentences, each describing, in a simplified language, a positive or negative shift for one of the 9 predicates (e.g. get red above green). This leads to a dataset D of 5000 triplets: (ci, d, cf). |
| Dataset Splits | No | The paper describes a 'training dataset D' and an 'oracle dataset O' used for evaluation of the LGG, but does not specify explicit train/validation/test splits or percentages for the overall experiments or for the main model training. |
| Hardware Specification | No | This work was performed using HPC resources from GENCI-IDRIS (Grant 20XX-AP010611667), the Me SU platform at Sorbonne-Universit e and the Pla FRIM experimental testbed. ... Each run leverages 24 cpus (24 actors) for about 72h for a total of 9.8 cpu years. Experiments presented in this paper requires machines with at least 24 cpu cores. |
| Software Dependencies | No | The paper mentions software like SAC, HER, and Adam optimizers but does not provide specific version numbers for any software components. |
| Experiment Setup | Yes | Implementation details and hyperparameters can be found in Appendix C. ... Table 4: Sensorimotor learning hyperparameters used in DECSTR. |