Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments

Authors: Hugo Caselles-Dupré, Olivier Sigaud, Mohamed CHETOUANI

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that combining BGI-agents (a pedagogical teacher and a pragmatic learner) results in faster learning and reduced goal ambiguity over standard learning from demonstrations, especially in the few demonstrations regime. We provide the code for our experiments 1, as well as an illustrative video explaining our approach 2.
Researcher Affiliation Academia Hugo Caselles-Dupré, Olivier Sigaud, Mohamed Chetouani Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique (ISIR) Paris, France casellesdupre.hugo@gmail.com,olivier.sigaud,mohamed.chetouani@isir.upmc.fr
Pseudocode Yes Algorithm 1 Two-phases training of the teacher and the learner
Open Source Code Yes We provide the code for our experiments 1, as well as an illustrative video explaining our approach 2. 1https://github.com/Caselles/NeurIPS22-demonstrations-pedagogy-pragmatism
Open Datasets Yes FBS is a block-stacking environment with two Fetch robots (teacher and learner) equipped with robotic arms, see Fig. 2. It is based on Mu Jo Co [41] and derived from the Fetch tasks [36].
Dataset Splits No The paper describes testing procedures and splits for evaluation metrics (GIA, OGIA, GRA) but does not provide explicit training/validation/test dataset splits in terms of percentages or counts for a fixed dataset, as is common in supervised learning. For reinforcement learning, data is typically generated iteratively rather than from a pre-defined static dataset with fixed splits.
Hardware Specification No The paper states
Software Dependencies No The paper mentions several software components and algorithms such as
Experiment Setup Yes All architecture, training and hyperparameters details are provided in Appendix B.2. ... Additionally, the pedagogical teacher rewards itself with the