Imitation Learning via Kernel Mean Embedding
Authors: Kee-Eung Kim, Hyun Soo Park
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on the same set of high-dimensional control imitation tasks with the identical settings as in the GAIL paper, with the largest task involving 376 observation and 17 action dimensions, demonstrate that the proposed approach performs better than or on a par with GAIL, and significantly outperforms GAIL particularly when the expert demonstration is scarce, with performance gain up to 41%. |
| Researcher Affiliation | Academia | Kee-Eung Kim School of Computer Science KAIST kekim@cs.kaist.ac.kr Hyun Soo Park Department of Computer Science and Engineering University of Minnesota hspark@umn.edu |
| Pseudocode | Yes | Algorithm 1: Generative Moment Matching Imitation Learning |
| Open Source Code | No | No explicit statement of the authors' own source code release for their methodology. The paper states they 'mostly leveraged the GAIL source code2 for implementing GMMIL and conducting experiments' and provides a link to the GAIL repository, but not their specific GMMIL implementation or associated code. |
| Open Datasets | No | The paper refers to 'demonstration dataset DπE provided by the expert' and uses environments like Open AI Gym and Mu Jo Co, but does not provide concrete access information (link, DOI, specific citation with author/year for the dataset itself) for the expert demonstration datasets used for training. |
| Dataset Splits | No | The paper mentions 'varying numbers of expert trajectories' but does not specify exact percentages or sample counts for training, validation, or test dataset splits of the expert demonstration data. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper mentions software like 'Open AI Gym', 'Mu Jo Co simulator', and 'TRPO' but does not provide specific version numbers for these or any other software dependencies needed for replication. |
| Experiment Setup | Yes | For fair comparison, we used the same experimental settings as in (Ho and Ermon 2016), including the exactly same neural network architectures for the policies and the optimizer parameters for TRPO. ... The first bandwidth parameter σ1 was selected as the median of the pairwise squared-ℓ2 distances among the data points from the expert policy and from the initial policy. The second bandwidth parameter σ2 was selected as the median of the pairwise squared-ℓ2 distances among the data points only from the expert policy... |