Grounding Language to Autonomously-Acquired Skills via Goal Generation

Authors: Ahmed Akakzia, Cédric Colas, Pierre-Yves Oudeyer, Mohamed CHETOUANI, Olivier Sigaud

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental section investigates three questions: how does DECSTR perform in the three phases? How does it compare to end-to-end LC-RL approaches? Do we need intermediate representations to be semantic?
Researcher Affiliation Academia Ahmed Akakzia Sorbonne Universit e ahmed.akakzia@isir.upmc.fr C edric Colas Inria cedric.colas@inria.fr Pierre-Yves Oudeyer Inria Mohamed Chetouani Sorbonne Universit e Olivier Sigaud Sorbonne Universit e
Pseudocode Yes Algorithm 1 and 2 present the high-level pseudo-code of any algorithm following the LGB architecture for each of the three phases.
Open Source Code Yes Code and videos can be found at https://sites.google.com/view/decstr/.
Open Datasets No A training dataset is collected via interactions between a DECSTR agent trained in phase G B and a social partner. DECSTR generates semantic goals and pursues them. For each trajectory, the social partner provides a description d of one change in objects relations from the initial configuration ci to the final one cf. The set of possible descriptions contains 102 sentences, each describing, in a simplified language, a positive or negative shift for one of the 9 predicates (e.g. get red above green). This leads to a dataset D of 5000 triplets: (ci, d, cf).
Dataset Splits No The paper describes a 'training dataset D' and an 'oracle dataset O' used for evaluation of the LGG, but does not specify explicit train/validation/test splits or percentages for the overall experiments or for the main model training.
Hardware Specification No This work was performed using HPC resources from GENCI-IDRIS (Grant 20XX-AP010611667), the Me SU platform at Sorbonne-Universit e and the Pla FRIM experimental testbed. ... Each run leverages 24 cpus (24 actors) for about 72h for a total of 9.8 cpu years. Experiments presented in this paper requires machines with at least 24 cpu cores.
Software Dependencies No The paper mentions software like SAC, HER, and Adam optimizers but does not provide specific version numbers for any software components.
Experiment Setup Yes Implementation details and hyperparameters can be found in Appendix C. ... Table 4: Sensorimotor learning hyperparameters used in DECSTR.