Human-AI Shared Control via Policy Dissection

Authors: Quanyi Li, Zhenghao Peng, Haibin Wu, Lan Feng, Bolei Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed approach on many RL tasks such as autonomous driving and locomotion. The experiments show that human-AI shared control system achieved by Policy Dissection in driving task can substantially improve the performance and safety in unseen traffic scenes. We evaluate the proposed method on several RL agents ranging from locomotion robots to autonomous driving agents in simulation environments. Experimental results suggest that meaningful motor primitives emerge in the internal representation. In the quantitative evaluation, we use the human AI shared control system enabled by Policy Dissection to improve the generalization in test-time unseen environment and achieve zero-shot task transfer.
Researcher Affiliation Academia Quanyi Li , Zhenghao Peng , Haibin Wu , Lan Feng , Bolei Zhou Centre for Perceptual and Interactive Intelligence, ETH Zurich, University of Edinburgh, University of California, Los Angeles
Pseudocode No The paper describes the steps of the Policy Dissection method in text, but it does not include any formal pseudocode or algorithm blocks (e.g., labeled 'Algorithm' or 'Pseudocode').
Open Source Code Yes Code and demo videos are available at https://metadriverse.github.io/policydissect.
Open Datasets Yes The experiments conducted on Cassie and ANYmal in Isaac Gym [57], Ant and Walker in Mu Jo Co [68] and the Bipedal Walker [9] follow the general setting. Other experiments conducted on Meta Drive and Pybullet-A1 environments are introduced as follows: Meta Drive. We train agents to accomplish autonomous driving task in the safety environment of Meta Drive [39]. Pybullet-A1. The legged robots locomotion experiments are conducted on the Pybullet-Unitree A1 [16] robot.
Dataset Splits No The paper mentions 'Training Set and Test Set' and describes the training and testing environments, but it does not explicitly mention a 'validation set' or provide details about its split percentages or sample counts.
Hardware Specification No The paper mentions simulation environments like 'Isaac Gym' and 'Pybullet', but it does not specify any concrete hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'PPO [61]', 'SAC [23]', and environments like 'Isaac Gym [57]', 'Mu Jo Co [68]', 'Pybullet [16]', but it does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes As shown in Fig. 4, for both autonomous driving and robot locomotion tasks, we prepare a training set and a held-out test set. For driving task, we train autonomous driving policies in 50 different environments where the traffic condition is mild (averagely 2 cars per map) and all traffic vehicles and the target vehicle can drive smoothly towards their destinations. No obstacles are in these environments. In testing time, the trained agents will be evaluated in the other 20 unseen maps with higher traffic density (averagely 6 cars per map). Besides, obstacles like traffic cones and breakdown vehicles will scatter randomly on the road. The evaluation process will be repeated 5 times for each agent. The evaluation is also repeated for 5 times on the exclusive test environment.