An Extensible Framework for Open Heterogeneous Collaborative Perception
Authors: Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Yanfeng Wang, Siheng Chen
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on OPV2V-H and DAIR-V2X datasets show that HEAL surpasses SOTA methods in performance while reducing the training parameters by 91.5% when integrating 3 new agent types. We further implement a comprehensive codebase at: https://github.com/yifanlu0227/HEAL. |
| Researcher Affiliation | Collaboration | Yifan Lu1,4, Yue Hu1,4, Yiqi Zhong2, Dequan Wang1,3, Yanfeng Wang1,3, Siheng Chen1,3,4B, 1 Shanghai Jiao Tong University, 2 University of Southern California, 3 Shanghai AI Lab 4 Multi-Agent Governance & Intelligence Crew (MAGIC) 1 {yifan lu, 18671129361, dequanwang, wangyanfeng, sihengc}@sjtu.edu.cn 2 yiqizhon@usc.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. The methods are described using text and mathematical equations. |
| Open Source Code | Yes | We further implement a comprehensive codebase at: https://github.com/yifanlu0227/HEAL. |
| Open Datasets | Yes | To evaluate HEAL and further promote open heterogeneous collaborative perception, we propose a large-scale heterogeneous collaborative perception dataset, OPV2V-H, which supplements more sensor types based on the existing OPV2V (Xu et al., 2022c). Extensive experiments on OPV2V-H and real-world dataset DAIR-V2X (Yu et al., 2022) show HEAL s remarkable performance. |
| Dataset Splits | Yes | In total, OPV2V-H has 10,524 samples, including 6374/1980/2170 in train/validation/test split, respectively. |
| Hardware Specification | Yes | Training costs 5 hours for the collaboration base on 2 RTX 3090 GPUs and 3 hours for each new agent s training, while Xiang et al. (2023) takes more than 1 day to converge with 4 agent types together. Metrics related to the training cost are all measured with batchsize 1 on 1 RTX A40. |
| Software Dependencies | No | The paper mentions various models and optimizers (e.g., Adam, ResNeXt, Point Pillars, Lift-Splat), and simulation environments (Open CDA, CARLA), but does not specify software dependencies with version numbers (e.g., PyTorch 1.x, CUDA 11.x). |
| Experiment Setup | Yes | The multi-scale feature dimension of Pyramid Fusion is [64,128,256]. The Res Ne Xt layers have [3,5,8] blocks each. Foreground estimators are 1 1 convolution with channel [64,128,256]. The hyper-parameter αℓ= {0.4, 0.2, 0.1}ℓ=1,2,3. We incorporated depth supervision for all camera detections to help convergence. We train the collaboration base and new agent types both for 25 epochs end-to-end with Adam, reducing the learning rate from 0.002 by 0.1 at epochs 15. |