OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning
Authors: Sheng Yue, Xingyuan Hua, Ju Ren, Sen Lin, Junshan Zhang, Yaoxue Zhang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, OLLIE consistently and significantly outperforms the baseline methods in 20 challenging tasks, from continuous control to vision-based domains, in terms of performance, demonstration efficiency, and convergence speed. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, Tsinghua University, Beijing, China 2Zhongguancun Laboratory, Beijing, China 3Department of Computer Science, University of Houston, Texas, US 4Department of Electrical and Computer Engineering, University of California, Davis, US. Correspondence to: Ju Ren <renju@tsinghua.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 Offline-to-online imitation learning (OLLIE) |
| Open Source Code | Yes | The code is available at https://github.com/HansenHua/OLLIE-offline-to-online-imitation-learning. |
| Open Datasets | Yes | During offline training, we use the D4RL datasets (Fu et al., 2020) for Ant Maze, Mu Jo Co, Adroit, and Franka Kitchen and use the robomimic (Mandlekar et al., 2022) datasets for vision-based Robomimic. |
| Dataset Splits | No | The paper describes its evaluation process (e.g., “running it in the environment for 10 episodes and computing the average undiscounted return”) and uses multiple seeds, but it does not provide specific dataset splits like percentages or sample counts for training, validation, or testing sets. |
| Hardware Specification | Yes | All the experiments are run on Ubuntu 20.04.2 LTS with 8 NVIDIA Ge Force RTX 4090 GPUs. |
| Software Dependencies | Yes | We implement our code using Pytorch 1.8.1, built upon the open-source framework of offline RL algorithms, provided at https://github.com/tinkoff-ai/CORL (under the Apache-2.0 License) and the implementation of DWBC, provided at https://github.com/ryanxhr/DWBC (under the MIT License). |
| Experiment Setup | Yes | Our method is straightforward to implement and forgiving to hyperparameters. Of note, except for the network structures in vision-based tasks (requiring the employment of CNNs), all hyperparameters are identical across tasks and settings. ... The hyperparameters are summarized in Table 5. |