reproducibilityindex.ai

Ess-InfoGAIL: Semi-supervised Imitation Learning from Imbalanced Demonstrations

Authors: Huiqiao Fu, Kaiqiang Tang, Yuanyang Lu, Yiming Qi, Guizhou Deng, Flood Sung, Chunlin Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the efficiency of our method in learning multi-modal behaviors from imbalanced demonstrations compared to baseline methods. In this section, we validate our method by conducting experiments in a variety of environments, including a simple 2D trajectory environment, as well as 4 challenging Mu Jo Co environments (Reacher, Pusher, Walker-2D, and Humanoid). First, we perform the quantitative analysis to demonstrate the superiority of our method in discovering disentangled behavior representations from imbalanced demonstrations with limited labeled data, as compared to baseline methods.
Researcher Affiliation	Collaboration	Huiqiao Fu1, Kaiqiang Tang1, Yuanyang Lu1, Yiming Qi1, Guizhou Deng1, Flood Sung2, Chunlin Chen1 1Nanjing University, China, 2Moonshot AI, China
Pseudocode	Yes	The schematic overview of the Ess-Info GAIL network architecture is shown in Fig. 2, and the pseudo code is shown in Algorithm 1.
Open Source Code	Yes	The code is available at https://github. com/t RNAo O/Ess-Info GAIL.
Open Datasets	No	The paper mentions using Mu Jo Co environments (Reacher, Pusher, Walker-2D, Humanoid) and a 2D trajectory environment. However, it states: "we first pre-train K expert policies, each corresponding to K different goals (or K behavior modes). Subsequently, we use these K expert policies to sample K sets of expert demonstrations." It does not provide access information (link, DOI, specific citation) to the generated or prepared datasets used for their experiments.
Dataset Splits	No	The paper states: "Proportions of labeled data are set at 0.1%, 0.5%, 1%, 10%, and 100%." and "Default label ratios, if not specified, are: 2D-Trajectory: 1%, Reacher: 0.5%, Pusher: 1%, Walker-2D: 1%, Humanoid: 2%.". While these are proportions of labeled data within their demonstrations, they do not specify standard train/validation/test splits for the overall datasets or the methodology for partitioning data for these sets.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. It only mentions the Mu Jo Co environments.
Software Dependencies	No	The paper mentions using techniques and frameworks like GAIL, Info GAIL, PPO, GAE(\lambda), TD(\lambda), Gumbel-Softmax, and RIM. However, it does not provide specific version numbers for any of the software dependencies or libraries used.
Experiment Setup	No	The paper introduces weighting coefficients (\lambda1, \lambda2, \lambda3) and a temperature parameter (\tau) but does not provide their specific numerical values or other concrete hyperparameters such as learning rates, batch sizes, or optimizer settings used during training. The section "Task setup" describes the environments but not the training configurations.