OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

Authors: Sheng Yue, Xingyuan Hua, Ju Ren, Sen Lin, Junshan Zhang, Yaoxue Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, OLLIE consistently and significantly outperforms the baseline methods in 20 challenging tasks, from continuous control to vision-based domains, in terms of performance, demonstration efficiency, and convergence speed.
Researcher Affiliation Academia 1Department of Computer Science and Technology, Tsinghua University, Beijing, China 2Zhongguancun Laboratory, Beijing, China 3Department of Computer Science, University of Houston, Texas, US 4Department of Electrical and Computer Engineering, University of California, Davis, US. Correspondence to: Ju Ren <renju@tsinghua.edu.cn>.
Pseudocode Yes Algorithm 1 Offline-to-online imitation learning (OLLIE)
Open Source Code Yes The code is available at https://github.com/HansenHua/OLLIE-offline-to-online-imitation-learning.
Open Datasets Yes During offline training, we use the D4RL datasets (Fu et al., 2020) for Ant Maze, Mu Jo Co, Adroit, and Franka Kitchen and use the robomimic (Mandlekar et al., 2022) datasets for vision-based Robomimic.
Dataset Splits No The paper describes its evaluation process (e.g., “running it in the environment for 10 episodes and computing the average undiscounted return”) and uses multiple seeds, but it does not provide specific dataset splits like percentages or sample counts for training, validation, or testing sets.
Hardware Specification Yes All the experiments are run on Ubuntu 20.04.2 LTS with 8 NVIDIA Ge Force RTX 4090 GPUs.
Software Dependencies Yes We implement our code using Pytorch 1.8.1, built upon the open-source framework of offline RL algorithms, provided at https://github.com/tinkoff-ai/CORL (under the Apache-2.0 License) and the implementation of DWBC, provided at https://github.com/ryanxhr/DWBC (under the MIT License).
Experiment Setup Yes Our method is straightforward to implement and forgiving to hyperparameters. Of note, except for the network structures in vision-based tasks (requiring the employment of CNNs), all hyperparameters are identical across tasks and settings. ... The hyperparameters are summarized in Table 5.