Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Active Imitation Learning with Random Network Distillation
Authors: Emilien Biré, Anthony Kobanda, Ludovic Denoyer, Rémy Portelas
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main contributions are threefold: i) We propose a new method called RND-DAgger, a novel interactive imitation learning approach leveraging state-based out-of-distribution identification through random network distillation. ii) We perform a comparative analysis of RND-DAgger and existing methods on 3 tasks: a robotics scenario and two video-game environments. iii) Throughout these experiments, we demonstrate that RND-DAgger either outperforms or matches existing approaches in terms of final performance while significantly reducing expert burden. |
| Researcher Affiliation | Collaboration | Emilien Biré1 , Anthony Kobanda2, Ludovic Denoyer3, Rémy Portelas2 1Centrale Supelec 2Ubisoft La Forge 3H Company EMAIL |
| Pseudocode | Yes | Algorithm 1 DAgger Algorithm 2 Lazy/Ensemble DAgger Algorithm 3 Ensemble-DAgger s CONDITION Algorithm 4 Lazy-DAgger s CONDITION Algorithm 5 RND-DAgger |
| Open Source Code | Yes | To ensure the reproducibility of our work, we provide detailed pseudo-code in section 2 and section 3. A comprehensive open-source codebase, including all environments, datasets, oracle model checkpoints, active learning algorithms, and a detailed guide on how to reproduce our experiments and results is available at https://sites.google.com/view/rnd-dagger. |
| Open Datasets | Yes | Our first environment is Half Cheetah which is a classical reinforcement learning environment1 where the objective is to learn a running strategy for the agent. ...1https://github.com/araffin/pybullet_envs_gymnasium We also propose and open-source two new environments developed for video game research . Race Car (see Figure 5) features a physics-based car controller... Finally, the 3D Maze environment allows us to study our strategy in goal-conditioned navigation scenarios. A comprehensive open-source codebase, including all environments, datasets, oracle model checkpoints, active learning algorithms, and a detailed guide on how to reproduce our experiments and results is available at https://sites.google.com/view/rnd-dagger. |
| Dataset Splits | No | The paper mentions collecting an initial training set and then iteratively expanding it, but it does not specify explicit train/validation/test splits with percentages, counts, or methods for partitioning the data for evaluation purposes. |
| Hardware Specification | No | This work was granted access to the HPC resources of IDRIS under the allocation 2024AD011015218 made by GENCI. |
| Software Dependencies | No | The paper does not explicitly list specific software components with their version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) in the main text or appendices. |
| Experiment Setup | Yes | Hyperparameters For each decision rule, several key hyperparameters had to be tuned: DAgger The probability β of a frame to be controlled by the bot. ... RND-DAgger Threshold λ of OOD detection The historic context length... The Minimal Expert Time W The size of the random network... Ensemble-DAgger Threshold τ for discrepancy measure Threshold χ for doubt measure The number of models N Lazy-DAgger Threshold βH for discrepancy measure Threshold βR for the backward controlled loop ... The Table 2 summarizes the values used for our grid search. |