Shared Autonomy with IDA: Interventional Diffusion Assistance
Authors: Brandon McMahan, Zhenghao (Mark) Peng, Bolei Zhou, Jonathan Kao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with simulated human pilots show that IDA achieves higher performance than pilot-only and traditional SA control in variants of the Reacher environment and Lunar Lander. We then demonstrate that IDA achieves better control in Lunar Lander with human-in-the-loop experiments. Human participants report greater autonomy with IDA and prefer IDA over pilot-only and traditional SA control. |
| Researcher Affiliation | Academia | Brandon J. Mc Mahan1, Zhenghao Peng1, Bolei Zhou1, Jonathan C. Kao1 1University of California, Los Angeles bmcmahan2025@g.ucla.edu pzh@cs.ucla.edu bolei@cs.ucla.edu kao@seas.ucla.edu |
| Pseudocode | Yes | Algorithm 1 SA with IDA |
| Open Source Code | Yes | We provide code as a supplementary zip file. Polished code will be released with the paper along with set up and installation instructs to allow others to reproduce results as well as build future work. |
| Open Datasets | Yes | The first environment we use is Reacher, a 2D simulation environment... Following previous works ((Reddy et al., 2018; Schaff and Walter, 2020; Yoneda et al., 2023; Tan et al., 2022)), we also use Lunar Lander, a 2D continuous control environment... We modify the environment as described in (Yoneda et al., 2023)... For each environment, we collected 10 million state-action pairs from episodes using the SAC expert. |
| Dataset Splits | No | The paper describes training steps (e.g., 'trained for 3 million time steps', 'collected 10 million state-action pairs') and evaluation on different pilot types, but does not specify explicit training/validation/test dataset splits with percentages or counts for the experiments, nor how hyperparameter tuning for evaluation was performed using a validation set. |
| Hardware Specification | Yes | All training was performed on a workstation with a single 3080Ti and took approximately 48 hours to complete all three steps for our tasks. |
| Software Dependencies | No | The paper mentions key methods like Soft Actor-Critic (SAC) and Denoising Diffusion Probabilistic Models (DDPM) and their parameterizations, but does not provide specific software dependency versions (e.g., Python 3.x, PyTorch 1.x, or specific library versions). |
| Experiment Setup | Yes | We parameterize our SAC model with a four-layer MLP with 256 units in each layer and the Re LU non-linearity. We use a learning rate of 3 10 4 and a replay buffer size of 106. The expert fully observes the environment including the goal, and is trained for 3 million time steps or until the environment is solved. |