RLIF: Interactive Imitation Learning as Reinforcement Learning

Authors: Jianlan Luo, Perry Dong, Yuexiang Zhai, Yi Ma, Sergey Levine

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Additional ablations also empirically verify the proposed theoretical justification that the performance of our method is associated with the choice of intervention model and suboptimality of the expert.
Researcher Affiliation Academia Jianlan Luo Perry Dong Yuexiang Zhai Yi Ma Sergey Levine UC Berkeley; {jianlanluo, perrydong}@berkeley.edu
Pseudocode Yes Algorithm 1 Interactive imitation, Algorithm 2 RLIF
Open Source Code Yes Code and videos can be found on the project website: rlif-page.github.io
Open Datasets Yes We use Gym locomotion and Adroit dexterous manipulation tasks in these experiments, based on the D4RL environments (Fu et al., 2020b). The initial datasets of all simulation tasks are subsets of datasets provided in d4rl. The specific dataset used to subsample and sizes of the initial dataset for each task are listed in Table 2.
Dataset Splits No The paper mentions training and testing on datasets but does not explicitly provide training/validation/test split percentages or counts. It refers to D4RL environments which have standard splits, but does not explicitly state the specific splits used.
Hardware Specification Yes For real robot experiments, we perform the task of peg insertion into 3D-printed board and cloth unfolding with velcro hooks using a 7-Do F Franka Research 3 robot arm. The robot obtains visual feedback from the Intel Realsense D405 cameras mounted on its end-effectors.
Software Dependencies Yes We use an Image Net pre-trained Efficient Net-B3 (Tan & Le, 2019) as a vision backbone for faster policy training.
Experiment Setup Yes Training Parameters. We set the number of rounds to N = 100 and the number of trajectories collected per round to 5. We also use the number of pretraining epochs and pretraining train steps per epoch to 200 and 300, and the epochs and train steps per epoch for each round to 25 and 100 to achieve consistent training. Table 5: RLIF and HG-DAgger parameters for each simulation task. Table 6: RLIF and HG-DAgger parameters for insertion task on Franka robot.