reproducibilityindex.ai

Mixed-Initiative Multiagent Apprenticeship Learning for Human Training of Robot Teams

Authors: Esmaeil Seraj, Jerry Xiong, Mariah Schrum, Matthew Gombolay

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Evaluation: We demonstrate through empirical evaluation and a human subject experiment that our Lf D-based Mix TURE outperforms RL based methods due to reward function independence and low sample complexity. Mix TURE outperforms a variety of relevant baselines on diverse data generated by human experts in complex heterogeneous domains. Mix TURE is the first MA-Lf D framework to enable learning multi-robot collaborative policies directly from real human data, resulting in 44% less human workload, and 46% higher usability score.
Researcher Affiliation	Academia	Esmaeil Seraj Georgia Institute of Technology eseraj3@gatech.edu Jerry Xiong Georgia Institute of Technology jxiong60@gatech.edu Mariah Schrum University of California Berkeley mariahschrum@berkeley.edu Matthew Gombolay Georgia Institute of Technology matthew.gombolay@cc.gatech.edu
Pseudocode	No	The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any clearly formatted code-like blocks. It includes an architecture diagram (Figure 1) and mathematical equations.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository for the methodology described.
Open Datasets	No	We evaluate Mix TURE on real, diverse human-generated data, collected in a human-subject user study, and show that, in a complex multi-agent domain with heterogeneous tasks, we are able to achieve 42% 77% higher performance and a significantly lower sample complexity. We also show that using Mix TURE significantly improves workload and system usability relative to a benchmark MA-Lf D framework. To best of our knowledge, this is the first work to train a MA-Lf D framework on real human data. (The paper collects its own data through a human-subject user study but does not provide access information or explicitly state that this data is publicly available).
Dataset Splits	No	The paper mentions training and testing data but does not explicitly specify a validation dataset split or its proportion. It mentions using 'best training models' which implies some internal validation, but details are not provided.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions algorithms and frameworks like PPO and GAIL, and a CTDE paradigm, but it does not specify versions for any key software components or libraries required for reproducibility.
Experiment Setup	No	The paper describes the overall architecture and training process, including the loss function (Eq. 2) and optimization algorithms (PPO), and a tunable scaling parameter λ, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations.