Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Shared Autonomy with IDA: Interventional Diffusion Assistance
Authors: Brandon McMahan, Zhenghao (Mark) Peng, Bolei Zhou, Jonathan Kao
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with simulated human pilots show that IDA achieves higher performance than pilot-only and traditional SA control in variants of the Reacher environment and Lunar Lander. We then demonstrate that IDA achieves better control in Lunar Lander with human-in-the-loop experiments. Human participants report greater autonomy with IDA and prefer IDA over pilot-only and traditional SA control. |
| Researcher Affiliation | Academia | Brandon J. Mc Mahan1, Zhenghao Peng1, Bolei Zhou1, Jonathan C. Kao1 1University of California, Los Angeles EMAIL EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 SA with IDA |
| Open Source Code | Yes | We provide code as a supplementary zip file. Polished code will be released with the paper along with set up and installation instructs to allow others to reproduce results as well as build future work. |
| Open Datasets | Yes | The first environment we use is Reacher, a 2D simulation environment... Following previous works ((Reddy et al., 2018; Schaff and Walter, 2020; Yoneda et al., 2023; Tan et al., 2022)), we also use Lunar Lander, a 2D continuous control environment... We modify the environment as described in (Yoneda et al., 2023)... For each environment, we collected 10 million state-action pairs from episodes using the SAC expert. |
| Dataset Splits | No | The paper describes training steps (e.g., 'trained for 3 million time steps', 'collected 10 million state-action pairs') and evaluation on different pilot types, but does not specify explicit training/validation/test dataset splits with percentages or counts for the experiments, nor how hyperparameter tuning for evaluation was performed using a validation set. |
| Hardware Specification | Yes | All training was performed on a workstation with a single 3080Ti and took approximately 48 hours to complete all three steps for our tasks. |
| Software Dependencies | No | The paper mentions key methods like Soft Actor-Critic (SAC) and Denoising Diffusion Probabilistic Models (DDPM) and their parameterizations, but does not provide specific software dependency versions (e.g., Python 3.x, PyTorch 1.x, or specific library versions). |
| Experiment Setup | Yes | We parameterize our SAC model with a four-layer MLP with 256 units in each layer and the Re LU non-linearity. We use a learning rate of 3 10 4 and a replay buffer size of 106. The expert fully observes the environment including the goal, and is trained for 3 million time steps or until the environment is solved. |