Mixed-Initiative Multiagent Apprenticeship Learning for Human Training of Robot Teams
Authors: Esmaeil Seraj, Jerry Xiong, Mariah Schrum, Matthew Gombolay
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Evaluation: We demonstrate through empirical evaluation and a human subject experiment that our Lf D-based Mix TURE outperforms RL based methods due to reward function independence and low sample complexity. Mix TURE outperforms a variety of relevant baselines on diverse data generated by human experts in complex heterogeneous domains. Mix TURE is the first MA-Lf D framework to enable learning multi-robot collaborative policies directly from real human data, resulting in 44% less human workload, and 46% higher usability score. |
| Researcher Affiliation | Academia | Esmaeil Seraj Georgia Institute of Technology eseraj3@gatech.edu Jerry Xiong Georgia Institute of Technology jxiong60@gatech.edu Mariah Schrum University of California Berkeley mariahschrum@berkeley.edu Matthew Gombolay Georgia Institute of Technology matthew.gombolay@cc.gatech.edu |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there any clearly formatted code-like blocks. It includes an architecture diagram (Figure 1) and mathematical equations. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository for the methodology described. |
| Open Datasets | No | We evaluate Mix TURE on real, diverse human-generated data, collected in a human-subject user study, and show that, in a complex multi-agent domain with heterogeneous tasks, we are able to achieve 42% 77% higher performance and a significantly lower sample complexity. We also show that using Mix TURE significantly improves workload and system usability relative to a benchmark MA-Lf D framework. To best of our knowledge, this is the first work to train a MA-Lf D framework on real human data. (The paper collects its own data through a human-subject user study but does not provide access information or explicitly state that this data is publicly available). |
| Dataset Splits | No | The paper mentions training and testing data but does not explicitly specify a validation dataset split or its proportion. It mentions using 'best training models' which implies some internal validation, but details are not provided. |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware specifications (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms and frameworks like PPO and GAIL, and a CTDE paradigm, but it does not specify versions for any key software components or libraries required for reproducibility. |
| Experiment Setup | No | The paper describes the overall architecture and training process, including the loss function (Eq. 2) and optimization algorithms (PPO), and a tunable scaling parameter λ, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations. |