Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization
Authors: Quanyi Li, Zhenghao Peng, Bolei Zhou
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that HACO achieves a substantially high sample efficiency in the safe driving benchmark. It can train agents to drive in unseen traffic scenes with a handful of human intervention budget and achieve high safety and generalizability, outperforming both reinforcement learning and imitation learning baselines with a large margin. |
| Researcher Affiliation | Academia | Quanyi Li1 , Zhenghao Peng2 , Bolei Zhou3 1Centre for Perceptual and Interactive Intelligence, 2The Chinese University of Hong Kong, 3University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1: The workflow of HACO during training |
| Open Source Code | Yes | Code and demo videos are available at: https://decisionforce.github.io/HACO/. |
| Open Datasets | Yes | We employ a lightweight driving simulator Meta Drive (Li et al., 2021), which preserves the capacity to evaluate the safety and generalizability in unseen environments. ... Though we mainly describe the setting of Meta Drive in this section, we also experiment on CARLA (Dosovitskiy et al., 2017) simulator in Sec. 4.3. |
| Dataset Splits | No | The paper states: 'We split the driving scenes into the training set and test set with 50 different scenes in each set.' However, it does not explicitly mention the existence or details of a validation set split. |
| Hardware Specification | Yes | When training the baselines, we host 8 concurrent trials in an Nvidia Ge Force RTX 2080 Ti GPU. Each trial consumes 2 CPUs with 8 parallel rollout workers. The main experiments of HACO is conducted on a local computer with an Nvidia Ge Force RTX 2070 and repeat 3 times. |
| Software Dependencies | No | The paper mentions implementing algorithms using RLLib and that the simulator is based on Panda3D and Bullet Engine, but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Information about other hyper-parameters is given in the Appendix. Appendix E lists hyper-parameters for HACO (Table 4) and various baselines (Table 5-11), including Discounted Factor γ, Learning Rate, Train Batch Size, etc. |