Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Authors: Michelle Zhao, Henny Admoni, Reid Simmons, Aaditya Ramdas, Andrea Bajcsy
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare Conformal DAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn’t) present because of changes in the expert’s policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, Conformal DAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior. [...] We instantiate Conformal DAgger in a simulated 4D robot goal-reaching task and in hardware on a 7 degree-of-freedom robotic manipulator [...] |
| Researcher Affiliation | Academia | a Robotics Institute, School of Computer Science, Carnegie Mellon University b Departments of Statistics and Machine Learning, Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 Conformal DAgger (changes from DAgger (Ross et al., 2011) highlighted) |
| Open Source Code | No | The paper states: "Project page at cmu-intentlab.github.io/conformalized-interactive-il/." This is a project page and does not explicitly state that the source code for the methodology is provided, nor is it a direct link to a code repository. |
| Open Datasets | Yes | We test on three benchmark datasets from Angelopoulos et al. (2023): (1) Amazon stock prices, (2) Google stock prices (Nguyen, 2018), and the (3) Elec2 dataset (Harries, 1999). |
| Dataset Splits | No | The paper describes an iterative online learning process where data is aggregated after each deployment episode. For the time series datasets, it mentions a 'lookback window k = 100 timesteps' for the nonconformity score calculation but does not specify explicit training/test/validation splits for model training or evaluation of the conformal intervals. |
| Hardware Specification | No | The paper mentions "hardware deployments on a 7DOF robotic manipulator" and "a 7 degree-of-freedom robotic manipulator" and a "Meta Quest 3 remote controller". However, it does not specify the computational hardware (e.g., GPU/CPU models, memory) used for running or training the models. |
| Software Dependencies | No | The paper states that base prediction models were "all trained via darts (Herzen et al., 2022)" and refers to using a "CNN-based diffusion policy (Chi et al., 2023)" and a "Res Net-18 visual encoder". However, it does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | Conformal DAgger uses an uncertainty threshold = 0.06, temperature β = 100, lookback window k = 100, lr = 0.6, and initial qlo,hi 0 = 0.01. [...] Ensemble DAgger [...] We use 3 ensemble members and an uncertainty threshold of = 0.06 for the ensemble disagreement and a safety classifier threshold of s = 0.03. Lazy DAgger uses s = 0.03 to begin human intervention, and only switches back to autonomous mode when the deviation between the learner s prediction and expert s action are below a context-switching threshold, 0.1 s. To make Safe DAgger s number of initial interventions comparable, we decrease the safety classifier threshold to s = 0.01. The initial robot policy r i=0 is trained on a dataset D0 of 10 expert trajectories with synthetically injected noise drawn from N(1, 0.5). |