RLIF: Interactive Imitation Learning as Reinforcement Learning
Authors: Jianlan Luo, Perry Dong, Yuexiang Zhai, Yi Ma, Sergey Levine
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Additional ablations also empirically verify the proposed theoretical justification that the performance of our method is associated with the choice of intervention model and suboptimality of the expert. |
| Researcher Affiliation | Academia | Jianlan Luo Perry Dong Yuexiang Zhai Yi Ma Sergey Levine UC Berkeley; {jianlanluo, perrydong}@berkeley.edu |
| Pseudocode | Yes | Algorithm 1 Interactive imitation, Algorithm 2 RLIF |
| Open Source Code | Yes | Code and videos can be found on the project website: rlif-page.github.io |
| Open Datasets | Yes | We use Gym locomotion and Adroit dexterous manipulation tasks in these experiments, based on the D4RL environments (Fu et al., 2020b). The initial datasets of all simulation tasks are subsets of datasets provided in d4rl. The specific dataset used to subsample and sizes of the initial dataset for each task are listed in Table 2. |
| Dataset Splits | No | The paper mentions training and testing on datasets but does not explicitly provide training/validation/test split percentages or counts. It refers to D4RL environments which have standard splits, but does not explicitly state the specific splits used. |
| Hardware Specification | Yes | For real robot experiments, we perform the task of peg insertion into 3D-printed board and cloth unfolding with velcro hooks using a 7-Do F Franka Research 3 robot arm. The robot obtains visual feedback from the Intel Realsense D405 cameras mounted on its end-effectors. |
| Software Dependencies | Yes | We use an Image Net pre-trained Efficient Net-B3 (Tan & Le, 2019) as a vision backbone for faster policy training. |
| Experiment Setup | Yes | Training Parameters. We set the number of rounds to N = 100 and the number of trajectories collected per round to 5. We also use the number of pretraining epochs and pretraining train steps per epoch to 200 and 300, and the epochs and train steps per epoch for each round to 25 and 100 to achieve consistent training. Table 5: RLIF and HG-DAgger parameters for each simulation task. Table 6: RLIF and HG-DAgger parameters for insertion task on Franka robot. |