Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback

Authors: Michelle Zhao, Henny Admoni, Reid Simmons, Aaditya Ramdas, Andrea Bajcsy

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare Conformal DAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn’t) present because of changes in the expert’s policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, Conformal DAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior. [...] We instantiate Conformal DAgger in a simulated 4D robot goal-reaching task and in hardware on a 7 degree-of-freedom robotic manipulator [...]
Researcher Affiliation	Academia	a Robotics Institute, School of Computer Science, Carnegie Mellon University b Departments of Statistics and Machine Learning, Carnegie Mellon University EMAIL
Pseudocode	Yes	Algorithm 1 Conformal DAgger (changes from DAgger (Ross et al., 2011) highlighted)
Open Source Code	No	The paper states: "Project page at cmu-intentlab.github.io/conformalized-interactive-il/." This is a project page and does not explicitly state that the source code for the methodology is provided, nor is it a direct link to a code repository.
Open Datasets	Yes	We test on three benchmark datasets from Angelopoulos et al. (2023): (1) Amazon stock prices, (2) Google stock prices (Nguyen, 2018), and the (3) Elec2 dataset (Harries, 1999).
Dataset Splits	No	The paper describes an iterative online learning process where data is aggregated after each deployment episode. For the time series datasets, it mentions a 'lookback window k = 100 timesteps' for the nonconformity score calculation but does not specify explicit training/test/validation splits for model training or evaluation of the conformal intervals.
Hardware Specification	No	The paper mentions "hardware deployments on a 7DOF robotic manipulator" and "a 7 degree-of-freedom robotic manipulator" and a "Meta Quest 3 remote controller". However, it does not specify the computational hardware (e.g., GPU/CPU models, memory) used for running or training the models.
Software Dependencies	No	The paper states that base prediction models were "all trained via darts (Herzen et al., 2022)" and refers to using a "CNN-based diffusion policy (Chi et al., 2023)" and a "Res Net-18 visual encoder". However, it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup	Yes	Conformal DAgger uses an uncertainty threshold = 0.06, temperature β = 100, lookback window k = 100, lr = 0.6, and initial qlo,hi 0 = 0.01. [...] Ensemble DAgger [...] We use 3 ensemble members and an uncertainty threshold of = 0.06 for the ensemble disagreement and a safety classiﬁer threshold of s = 0.03. Lazy DAgger uses s = 0.03 to begin human intervention, and only switches back to autonomous mode when the deviation between the learner s prediction and expert s action are below a context-switching threshold, 0.1 s. To make Safe DAgger s number of initial interventions comparable, we decrease the safety classiﬁer threshold to s = 0.01. The initial robot policy r i=0 is trained on a dataset D0 of 10 expert trajectories with synthetically injected noise drawn from N(1, 0.5).