Mixed-Initiative Multiagent Apprenticeship Learning for Human Training of Robot Teams

Authors: Esmaeil Seraj, Jerry Xiong, Mariah Schrum, Matthew Gombolay

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Evaluation: We demonstrate through empirical evaluation and a human subject experiment that our Lf D-based Mix TURE outperforms RL based methods due to reward function independence and low sample complexity. Mix TURE outperforms a variety of relevant baselines on diverse data generated by human experts in complex heterogeneous domains. Mix TURE is the first MA-Lf D framework to enable learning multi-robot collaborative policies directly from real human data, resulting in 44% less human workload, and 46% higher usability score.
Researcher Affiliation Academia Esmaeil Seraj Georgia Institute of Technology eseraj3@gatech.edu Jerry Xiong Georgia Institute of Technology jxiong60@gatech.edu Mariah Schrum University of California Berkeley mariahschrum@berkeley.edu Matthew Gombolay Georgia Institute of Technology matthew.gombolay@cc.gatech.edu
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any clearly formatted code-like blocks. It includes an architecture diagram (Figure 1) and mathematical equations.
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository for the methodology described.
Open Datasets No We evaluate Mix TURE on real, diverse human-generated data, collected in a human-subject user study, and show that, in a complex multi-agent domain with heterogeneous tasks, we are able to achieve 42% 77% higher performance and a significantly lower sample complexity. We also show that using Mix TURE significantly improves workload and system usability relative to a benchmark MA-Lf D framework. To best of our knowledge, this is the first work to train a MA-Lf D framework on real human data. (The paper collects its own data through a human-subject user study but does not provide access information or explicitly state that this data is publicly available).
Dataset Splits No The paper mentions training and testing data but does not explicitly specify a validation dataset split or its proportion. It mentions using 'best training models' which implies some internal validation, but details are not provided.
Hardware Specification No The paper does not provide any specific details regarding the hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions algorithms and frameworks like PPO and GAIL, and a CTDE paradigm, but it does not specify versions for any key software components or libraries required for reproducibility.
Experiment Setup No The paper describes the overall architecture and training process, including the loss function (Eq. 2) and optimization algorithms (PPO), and a tunable scaling parameter λ, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations.