Coordinated Multi-Agent Imitation Learning
Authors: Hoang M. Le, Yisong Yue, Peter Carr, Patrick Lucey
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our method in two settings. The first is a synthetic experiment based on the popular predator-prey game. The second is a challenging task of learning multiple policies for team defense in professional soccer, using a large training set1 of play sequences illustrated by Figure 1. We show that learning a good latent structure to encode implicit coordination yields significantly superior imitation performance compared to conventional baselines. |
| Researcher Affiliation | Collaboration | 1California Institute of Technology, Pasadena, CA 2Disney Research, Pittsburgh, PA 3STATS LLC, Chicago, IL. |
| Pseudocode | Yes | Algorithm 1 Coordinated Multi-Agent Imitation Learning |
| Open Source Code | No | Footnote 1 states 'Data at http://www.stats.com/data-science/ and see video result at http://hoangminhle.github.io'. This links to a data source and a video result, not the source code for the methodology. |
| Open Datasets | No | The paper mentions using 'tracking data from 45 games of real professional soccer' and 'The demonstration data is collected from 1000 game instances' for the predator-prey domain, and footnote 1 states 'Data at http://www.stats.com/data-science/'. However, this is a general link to a company's data science page, not a specific direct URL, DOI, or repository for the exact datasets used with proper bibliographic information or attribution. |
| Dataset Splits | No | Algorithm 1 includes 'until No improvement on validation set', implying a validation set is used. However, the experimental sections do not provide specific details such as split percentages, sample counts, or methodology for the validation set, which are necessary for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'recurrent neural network structure (LSTM)' and 'random forest', but it does not specify software names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x, scikit-learn 0.x) for reproducibility. |
| Experiment Setup | Yes | Each policy is represented by a recurrent neural network structure (LSTM), with two hidden layers of 512 units each. As LSTMs generally require fixed-length input sequences, we further chunk each trajectory into sub-sequences of length 50, with overlapping window of 25 time steps. The joint multi-agent imitation learning procedure follows Algorithm 2 closely. In this setup, without access to dynamic oracles for imitation learning in the style of SEARN (Daum e III et al., 2009) and DAgger (Ross et al., 2011), we gradually increase the horizon of the rolled-out trajectories from 1 to 10 steps lookahead. |