Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization
Authors: Quanyi Li, Zhenghao Peng, Bolei Zhou
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark. It can train agents to drive in unseen traffic scenes with a handful of human intervention budget and achieve high safety and generalizability, outperforming both reinforcement learning and imitation learning baselines with a large margin. |
| Researcher Affiliation | Academia | Quanyi Li1 , Zhenghao Peng2 , Bolei Zhou3 1Centre for Perceptual and Interactive Intelligence, 2The Chinese University of Hong Kong, 3University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1: The workflow of HACO during training |
| Open Source Code | Yes | Code and demo videos are available at: https://decisionforce.github.io/HACO/. |
| Open Datasets | Yes | We employ a lightweight driving simulator Meta Drive (Li et al., 2021), which preserves the capacity to evaluate the safety and generalizability in unseen environments. ... Though we mainly describe the setting of Meta Drive in this section, we also experiment on CARLA (Dosovitskiy et al., 2017) simulator in Sec. 4.3. |
| Dataset Splits | No | The paper states: 'We split the driving scenes into the training set and test set with 50 different scenes in each set.' However, it does not explicitly mention the existence or details of a validation set split. |
| Hardware Specification | Yes | When training the baselines, we host 8 concurrent trials in an Nvidia Ge Force RTX 2080 Ti GPU. Each trial consumes 2 CPUs with 8 parallel rollout workers. The main experiments of HACO is conducted on a local computer with an Nvidia Ge Force RTX 2070 and repeat 3 times. |
| Software Dependencies | No | The paper mentions implementing algorithms using RLLib and that the simulator is based on Panda3D and Bullet Engine, but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Information about other hyper-parameters is given in the Appendix. Appendix E lists hyper-parameters for HACO (Table 4) and various baselines (Table 5-11), including Discounted Factor γ, Learning Rate, Train Batch Size, etc. |