Shared Autonomy with IDA: Interventional Diffusion Assistance

Authors: Brandon McMahan, Zhenghao (Mark) Peng, Bolei Zhou, Jonathan Kao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with simulated human pilots show that IDA achieves higher performance than pilot-only and traditional SA control in variants of the Reacher environment and Lunar Lander. We then demonstrate that IDA achieves better control in Lunar Lander with human-in-the-loop experiments. Human participants report greater autonomy with IDA and prefer IDA over pilot-only and traditional SA control.
Researcher Affiliation Academia Brandon J. Mc Mahan1, Zhenghao Peng1, Bolei Zhou1, Jonathan C. Kao1 1University of California, Los Angeles bmcmahan2025@g.ucla.edu pzh@cs.ucla.edu bolei@cs.ucla.edu kao@seas.ucla.edu
Pseudocode Yes Algorithm 1 SA with IDA
Open Source Code Yes We provide code as a supplementary zip file. Polished code will be released with the paper along with set up and installation instructs to allow others to reproduce results as well as build future work.
Open Datasets Yes The first environment we use is Reacher, a 2D simulation environment... Following previous works ((Reddy et al., 2018; Schaff and Walter, 2020; Yoneda et al., 2023; Tan et al., 2022)), we also use Lunar Lander, a 2D continuous control environment... We modify the environment as described in (Yoneda et al., 2023)... For each environment, we collected 10 million state-action pairs from episodes using the SAC expert.
Dataset Splits No The paper describes training steps (e.g., 'trained for 3 million time steps', 'collected 10 million state-action pairs') and evaluation on different pilot types, but does not specify explicit training/validation/test dataset splits with percentages or counts for the experiments, nor how hyperparameter tuning for evaluation was performed using a validation set.
Hardware Specification Yes All training was performed on a workstation with a single 3080Ti and took approximately 48 hours to complete all three steps for our tasks.
Software Dependencies No The paper mentions key methods like Soft Actor-Critic (SAC) and Denoising Diffusion Probabilistic Models (DDPM) and their parameterizations, but does not provide specific software dependency versions (e.g., Python 3.x, PyTorch 1.x, or specific library versions).
Experiment Setup Yes We parameterize our SAC model with a four-layer MLP with 256 units in each layer and the Re LU non-linearity. We use a learning rate of 3 10 4 and a replay buffer size of 106. The expert fully observes the environment including the goal, and is trained for 3 million time steps or until the environment is solved.