reproducibilityindex.ai

Imitation Learning from Imperfect Demonstration

Authors: Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments In this section, we aim to answer the following questions with experiments. (1) Do 2IWIL and IC-GAIL methods allow agents to learn near-optimal policies when limited conﬁdence information is given? (2) Are the methods robust enough when the given conﬁdence is less accurate? and (3) Do more unlabeled data results in better performance in terms of average return? The discussions are given in Sec. 5.1, 5.2, and 5.3 respectively. Setup To collect demonstration data, we train an optimal policy (πopt) using TRPO (Schulman et1 al., 2015) and select two intermediate policies (π1 and π2). The three policies are used to generate the same number of state-action pairs. ... We compare the proposed methods against three baselines. ... To assess our methods, we conduct experiments on Mujoco (Todorov et al., 2012). Each experiment is performed with ﬁve random seeds.
Researcher Affiliation	Academia	1National Taiwan University, Taiwan 2RIKEN Center for Advanced Intelligence Project, Japan 3The University of Tokyo, Japan.
Pseudocode	Yes	Algorithm 1 2IWIL
Open Source Code	No	The paper does not provide explicit statements or links to open-source code for the described methodology.
Open Datasets	Yes	To assess our methods, we conduct experiments on Mujoco (Todorov et al., 2012). Each experiment is performed with ﬁve random seeds.
Dataset Splits	No	The paper mentions demonstration datasets Dc and Du but does not specify standard training, validation, or test splits for these datasets, nor does it refer to predefined splits from cited sources for reproducibility of data partitioning.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or specific computing cluster configurations.
Software Dependencies	No	The paper mentions TRPO and Mujoco but does not specify version numbers for these or any other software dependencies, such as programming languages or deep learning frameworks.
Experiment Setup	Yes	The hyper-parameter τ of IC-GAIL is set to 0.7 for all tasks. ... Each experiment is performed with ﬁve random seeds.