How to Leverage Diverse Demonstrations in Offline Imitation Learning

Authors: Sheng Yue, Jiani Liu, Xingyuan Hua, Ju Ren, Sen Lin, Junshan Zhang, Yaoxue Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments, we evaluate our method on a suite of complex and high-dimensional offline IL benchmarks, including continuous-control and vision-based tasks. The results demonstrate that our method achieves state-of-the-art performance, outperforming existing methods on 20/21 benchmarks, typically by 2-5x, while maintaining a comparable runtime to Behavior Cloning (BC).
Researcher Affiliation Academia 1Department of Computer Science and Technology, Tsinghua University, Beijing, China 2Zhongguancun Laboratory, Beijing, China 3Department of Computer Science, University of Houston, Texas, US 4Department of Electrical and Computer Engineering, University of California, Davis, US.
Pseudocode Yes Algorithm 1 ILID Require: Expert data De, imperfect data Db, rollback K
Open Source Code Yes The code is available at https://github.com/Hansen-Hua/ILID-offline-imitation-learning.
Open Datasets Yes We employ the D4RL datasets (Fu et al., 2020) for Ant Maze, Mu Jo Co, Adroit, and Franka Kitchen and use the robomimic (Mandlekar et al., 2021) datasets for vision-based Robomimic.
Dataset Splits No The paper describes using expert and imperfect datasets for training and evaluating the learned policy in the environment. However, it does not specify explicit numerical training/validation/test dataset splits (e.g., percentages or counts) for its own experimental setup, nor does it reference predefined dataset splits in terms of train/validation/test partitions for the datasets used.
Hardware Specification Yes All the experiments are run on Ubuntu 20.04.2 LTS with 8 NVIDIA Ge Force RTX 4090 GPUs.
Software Dependencies Yes We implement our code using Pytorch 1.8.1, built upon the open-source framework of offline RL algorithms, provided at https://github.com/tinkoff-ai/CORL (under the Apache-2.0 License) and the implementation of DWBC, provided at https://github.com/ryanxhr/DWBC (under the MIT License).
Experiment Setup Yes We represent the policy as a 2-layer feedforward neural network with 256 hidden units, Re LU activation functions, and Tanh Gaussian outputs. Analogously, the discriminators are represented as a 2-layer feedforward neurl net with 256 hidden units, Re LU activations with the output clipped to [0.1, 0.9]. For vision-based tasks, we change the network architectures to a simple CNN, consisting of two convolutional layers, each with a 3 3 convolutional kernel and 2 2 max pooling. We adopt Adam as the optimizer. All learning rates and batchsizes are set to 1e-5 and 256, respectively. The thresholds σ for identifying expert states is set to 0.2, and the rollback step K is set to 20. The hyperparameters are summarized in Table 4.