Ess-InfoGAIL: Semi-supervised Imitation Learning from Imbalanced Demonstrations
Authors: Huiqiao Fu, Kaiqiang Tang, Yuanyang Lu, Yiming Qi, Guizhou Deng, Flood Sung, Chunlin Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the efficiency of our method in learning multi-modal behaviors from imbalanced demonstrations compared to baseline methods. In this section, we validate our method by conducting experiments in a variety of environments, including a simple 2D trajectory environment, as well as 4 challenging Mu Jo Co environments (Reacher, Pusher, Walker-2D, and Humanoid). First, we perform the quantitative analysis to demonstrate the superiority of our method in discovering disentangled behavior representations from imbalanced demonstrations with limited labeled data, as compared to baseline methods. |
| Researcher Affiliation | Collaboration | Huiqiao Fu1, Kaiqiang Tang1, Yuanyang Lu1, Yiming Qi1, Guizhou Deng1, Flood Sung2, Chunlin Chen1 1Nanjing University, China, 2Moonshot AI, China |
| Pseudocode | Yes | The schematic overview of the Ess-Info GAIL network architecture is shown in Fig. 2, and the pseudo code is shown in Algorithm 1. |
| Open Source Code | Yes | The code is available at https://github. com/t RNAo O/Ess-Info GAIL. |
| Open Datasets | No | The paper mentions using Mu Jo Co environments (Reacher, Pusher, Walker-2D, Humanoid) and a 2D trajectory environment. However, it states: "we first pre-train K expert policies, each corresponding to K different goals (or K behavior modes). Subsequently, we use these K expert policies to sample K sets of expert demonstrations." It does not provide access information (link, DOI, specific citation) to the *generated or prepared datasets* used for their experiments. |
| Dataset Splits | No | The paper states: "Proportions of labeled data are set at 0.1%, 0.5%, 1%, 10%, and 100%." and "Default label ratios, if not specified, are: 2D-Trajectory: 1%, Reacher: 0.5%, Pusher: 1%, Walker-2D: 1%, Humanoid: 2%.". While these are proportions of labeled data *within* their demonstrations, they do not specify standard train/validation/test splits for the overall datasets or the methodology for partitioning data for these sets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. It only mentions the Mu Jo Co environments. |
| Software Dependencies | No | The paper mentions using techniques and frameworks like GAIL, Info GAIL, PPO, GAE(\lambda), TD(\lambda), Gumbel-Softmax, and RIM. However, it does not provide specific version numbers for any of the software dependencies or libraries used. |
| Experiment Setup | No | The paper introduces weighting coefficients (\lambda1, \lambda2, \lambda3) and a temperature parameter (\tau) but does not provide their specific numerical values or other concrete hyperparameters such as learning rates, batch sizes, or optimizer settings used during training. The section "Task setup" describes the environments but not the training configurations. |