reproducibilityindex.ai

OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

Authors: Sheng Yue, Xingyuan Hua, Ju Ren, Sen Lin, Junshan Zhang, Yaoxue Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, OLLIE consistently and significantly outperforms the baseline methods in 20 challenging tasks, from continuous control to vision-based domains, in terms of performance, demonstration efficiency, and convergence speed.
Researcher Affiliation	Academia	1Department of Computer Science and Technology, Tsinghua University, Beijing, China 2Zhongguancun Laboratory, Beijing, China 3Department of Computer Science, University of Houston, Texas, US 4Department of Electrical and Computer Engineering, University of California, Davis, US. Correspondence to: Ju Ren <renju@tsinghua.edu.cn>.
Pseudocode	Yes	Algorithm 1 Offline-to-online imitation learning (OLLIE)
Open Source Code	Yes	The code is available at https://github.com/HansenHua/OLLIE-offline-to-online-imitation-learning.
Open Datasets	Yes	During offline training, we use the D4RL datasets (Fu et al., 2020) for Ant Maze, Mu Jo Co, Adroit, and Franka Kitchen and use the robomimic (Mandlekar et al., 2022) datasets for vision-based Robomimic.
Dataset Splits	No	The paper describes its evaluation process (e.g., “running it in the environment for 10 episodes and computing the average undiscounted return”) and uses multiple seeds, but it does not provide specific dataset splits like percentages or sample counts for training, validation, or testing sets.
Hardware Specification	Yes	All the experiments are run on Ubuntu 20.04.2 LTS with 8 NVIDIA Ge Force RTX 4090 GPUs.
Software Dependencies	Yes	We implement our code using Pytorch 1.8.1, built upon the open-source framework of offline RL algorithms, provided at https://github.com/tinkoff-ai/CORL (under the Apache-2.0 License) and the implementation of DWBC, provided at https://github.com/ryanxhr/DWBC (under the MIT License).
Experiment Setup	Yes	Our method is straightforward to implement and forgiving to hyperparameters. Of note, except for the network structures in vision-based tasks (requiring the employment of CNNs), all hyperparameters are identical across tasks and settings. ... The hyperparameters are summarized in Table 5.