Coordinated Multi-Agent Imitation Learning

Authors: Hoang M. Le, Yisong Yue, Peter Carr, Patrick Lucey

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our method in two settings. The first is a synthetic experiment based on the popular predator-prey game. The second is a challenging task of learning multiple policies for team defense in professional soccer, using a large training set1 of play sequences illustrated by Figure 1. We show that learning a good latent structure to encode implicit coordination yields significantly superior imitation performance compared to conventional baselines.
Researcher Affiliation Collaboration 1California Institute of Technology, Pasadena, CA 2Disney Research, Pittsburgh, PA 3STATS LLC, Chicago, IL.
Pseudocode Yes Algorithm 1 Coordinated Multi-Agent Imitation Learning
Open Source Code No Footnote 1 states 'Data at http://www.stats.com/data-science/ and see video result at http://hoangminhle.github.io'. This links to a data source and a video result, not the source code for the methodology.
Open Datasets No The paper mentions using 'tracking data from 45 games of real professional soccer' and 'The demonstration data is collected from 1000 game instances' for the predator-prey domain, and footnote 1 states 'Data at http://www.stats.com/data-science/'. However, this is a general link to a company's data science page, not a specific direct URL, DOI, or repository for the exact datasets used with proper bibliographic information or attribution.
Dataset Splits No Algorithm 1 includes 'until No improvement on validation set', implying a validation set is used. However, the experimental sections do not provide specific details such as split percentages, sample counts, or methodology for the validation set, which are necessary for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper mentions using 'recurrent neural network structure (LSTM)' and 'random forest', but it does not specify software names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x, scikit-learn 0.x) for reproducibility.
Experiment Setup Yes Each policy is represented by a recurrent neural network structure (LSTM), with two hidden layers of 512 units each. As LSTMs generally require fixed-length input sequences, we further chunk each trajectory into sub-sequences of length 50, with overlapping window of 25 time steps. The joint multi-agent imitation learning procedure follows Algorithm 2 closely. In this setup, without access to dynamic oracles for imitation learning in the style of SEARN (Daum e III et al., 2009) and DAgger (Ross et al., 2011), we gradually increase the horizon of the rolled-out trajectories from 1 to 10 steps lookahead.