Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

Authors: Sangwon Seo, Vaibhav V. Unhelkar

IJCAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the in๏ฌ‚uence of team members (time-varying and potentially misaligned) mental states on their behavior.
Researcher Affiliation Academia Sangwon Seo and Vaibhav V. Unhelkar Rice University EMAIL
Pseudocode Yes Algorithm 1 Bayesian Team Imitation Learner (BTIL)
Open Source Code No No explicit statement providing open-source code for the methodology or a direct link to a code repository was found.
Open Datasets No No concrete access information (specific link, DOI, repository name, formal citation) for a publicly available or open dataset was provided for either the synthetic or human-AI teamwork datasets created by the authors.
Dataset Splits No For each domain, we generate 200 demonstrations for training and 100 for evaluation.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments were provided.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1').
Experiment Setup No The paper mentions that 'Implementation details of BTIL and the baselines are provided in Appendix D' but does not include specific hyperparameter values or detailed training configurations in the provided text.