reproducibilityindex.ai

RLIF: Interactive Imitation Learning as Reinforcement Learning

Authors: Jianlan Luo, Perry Dong, Yuexiang Zhai, Yi Ma, Sergey Levine

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Additional ablations also empirically verify the proposed theoretical justification that the performance of our method is associated with the choice of intervention model and suboptimality of the expert.
Researcher Affiliation	Academia	Jianlan Luo Perry Dong Yuexiang Zhai Yi Ma Sergey Levine UC Berkeley; {jianlanluo, perrydong}@berkeley.edu
Pseudocode	Yes	Algorithm 1 Interactive imitation, Algorithm 2 RLIF
Open Source Code	Yes	Code and videos can be found on the project website: rlif-page.github.io
Open Datasets	Yes	We use Gym locomotion and Adroit dexterous manipulation tasks in these experiments, based on the D4RL environments (Fu et al., 2020b). The initial datasets of all simulation tasks are subsets of datasets provided in d4rl. The specific dataset used to subsample and sizes of the initial dataset for each task are listed in Table 2.
Dataset Splits	No	The paper mentions training and testing on datasets but does not explicitly provide training/validation/test split percentages or counts. It refers to D4RL environments which have standard splits, but does not explicitly state the specific splits used.
Hardware Specification	Yes	For real robot experiments, we perform the task of peg insertion into 3D-printed board and cloth unfolding with velcro hooks using a 7-Do F Franka Research 3 robot arm. The robot obtains visual feedback from the Intel Realsense D405 cameras mounted on its end-effectors.
Software Dependencies	Yes	We use an Image Net pre-trained Efficient Net-B3 (Tan & Le, 2019) as a vision backbone for faster policy training.
Experiment Setup	Yes	Training Parameters. We set the number of rounds to N = 100 and the number of trajectories collected per round to 5. We also use the number of pretraining epochs and pretraining train steps per epoch to 200 and 300, and the epochs and train steps per epoch for each round to 25 and 100 to achieve consistent training. Table 5: RLIF and HG-DAgger parameters for each simulation task. Table 6: RLIF and HG-DAgger parameters for insertion task on Franka robot.