Imitation Learning from Imperfect Demonstration

Authors: Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments In this section, we aim to answer the following questions with experiments. (1) Do 2IWIL and IC-GAIL methods allow agents to learn near-optimal policies when limited confidence information is given? (2) Are the methods robust enough when the given confidence is less accurate? and (3) Do more unlabeled data results in better performance in terms of average return? The discussions are given in Sec. 5.1, 5.2, and 5.3 respectively. Setup To collect demonstration data, we train an optimal policy (πopt) using TRPO (Schulman et1 al., 2015) and select two intermediate policies (π1 and π2). The three policies are used to generate the same number of state-action pairs. ... We compare the proposed methods against three baselines. ... To assess our methods, we conduct experiments on Mujoco (Todorov et al., 2012). Each experiment is performed with five random seeds.
Researcher Affiliation Academia 1National Taiwan University, Taiwan 2RIKEN Center for Advanced Intelligence Project, Japan 3The University of Tokyo, Japan.
Pseudocode Yes Algorithm 1 2IWIL
Open Source Code No The paper does not provide explicit statements or links to open-source code for the described methodology.
Open Datasets Yes To assess our methods, we conduct experiments on Mujoco (Todorov et al., 2012). Each experiment is performed with five random seeds.
Dataset Splits No The paper mentions demonstration datasets Dc and Du but does not specify standard training, validation, or test splits for these datasets, nor does it refer to predefined splits from cited sources for reproducibility of data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or specific computing cluster configurations.
Software Dependencies No The paper mentions TRPO and Mujoco but does not specify version numbers for these or any other software dependencies, such as programming languages or deep learning frameworks.
Experiment Setup Yes The hyper-parameter τ of IC-GAIL is set to 0.7 for all tasks. ... Each experiment is performed with five random seeds.