Imitation Learning from Imperfect Demonstration
Authors: Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments In this section, we aim to answer the following questions with experiments. (1) Do 2IWIL and IC-GAIL methods allow agents to learn near-optimal policies when limited confidence information is given? (2) Are the methods robust enough when the given confidence is less accurate? and (3) Do more unlabeled data results in better performance in terms of average return? The discussions are given in Sec. 5.1, 5.2, and 5.3 respectively. Setup To collect demonstration data, we train an optimal policy (πopt) using TRPO (Schulman et1 al., 2015) and select two intermediate policies (π1 and π2). The three policies are used to generate the same number of state-action pairs. ... We compare the proposed methods against three baselines. ... To assess our methods, we conduct experiments on Mujoco (Todorov et al., 2012). Each experiment is performed with five random seeds. |
| Researcher Affiliation | Academia | 1National Taiwan University, Taiwan 2RIKEN Center for Advanced Intelligence Project, Japan 3The University of Tokyo, Japan. |
| Pseudocode | Yes | Algorithm 1 2IWIL |
| Open Source Code | No | The paper does not provide explicit statements or links to open-source code for the described methodology. |
| Open Datasets | Yes | To assess our methods, we conduct experiments on Mujoco (Todorov et al., 2012). Each experiment is performed with five random seeds. |
| Dataset Splits | No | The paper mentions demonstration datasets Dc and Du but does not specify standard training, validation, or test splits for these datasets, nor does it refer to predefined splits from cited sources for reproducibility of data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as CPU/GPU models, memory, or specific computing cluster configurations. |
| Software Dependencies | No | The paper mentions TRPO and Mujoco but does not specify version numbers for these or any other software dependencies, such as programming languages or deep learning frameworks. |
| Experiment Setup | Yes | The hyper-parameter τ of IC-GAIL is set to 0.7 for all tasks. ... Each experiment is performed with five random seeds. |