reproducibilityindex.ai

Imitation Learning via Kernel Mean Embedding

Authors: Kee-Eung Kim, Hyun Soo Park

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on the same set of high-dimensional control imitation tasks with the identical settings as in the GAIL paper, with the largest task involving 376 observation and 17 action dimensions, demonstrate that the proposed approach performs better than or on a par with GAIL, and signiﬁcantly outperforms GAIL particularly when the expert demonstration is scarce, with performance gain up to 41%.
Researcher Affiliation	Academia	Kee-Eung Kim School of Computer Science KAIST kekim@cs.kaist.ac.kr Hyun Soo Park Department of Computer Science and Engineering University of Minnesota hspark@umn.edu
Pseudocode	Yes	Algorithm 1: Generative Moment Matching Imitation Learning
Open Source Code	No	No explicit statement of the authors' own source code release for their methodology. The paper states they 'mostly leveraged the GAIL source code2 for implementing GMMIL and conducting experiments' and provides a link to the GAIL repository, but not their specific GMMIL implementation or associated code.
Open Datasets	No	The paper refers to 'demonstration dataset DπE provided by the expert' and uses environments like Open AI Gym and Mu Jo Co, but does not provide concrete access information (link, DOI, specific citation with author/year for the dataset itself) for the expert demonstration datasets used for training.
Dataset Splits	No	The paper mentions 'varying numbers of expert trajectories' but does not specify exact percentages or sample counts for training, validation, or test dataset splits of the expert demonstration data.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud instance types used for running experiments.
Software Dependencies	No	The paper mentions software like 'Open AI Gym', 'Mu Jo Co simulator', and 'TRPO' but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup	Yes	For fair comparison, we used the same experimental settings as in (Ho and Ermon 2016), including the exactly same neural network architectures for the policies and the optimizer parameters for TRPO. ... The ﬁrst bandwidth parameter σ1 was selected as the median of the pairwise squared-ℓ2 distances among the data points from the expert policy and from the initial policy. The second bandwidth parameter σ2 was selected as the median of the pairwise squared-ℓ2 distances among the data points only from the expert policy...