An Extensible Framework for Open Heterogeneous Collaborative Perception

Authors: Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Yanfeng Wang, Siheng Chen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on OPV2V-H and DAIR-V2X datasets show that HEAL surpasses SOTA methods in performance while reducing the training parameters by 91.5% when integrating 3 new agent types. We further implement a comprehensive codebase at: https://github.com/yifanlu0227/HEAL.
Researcher Affiliation Collaboration Yifan Lu1,4, Yue Hu1,4, Yiqi Zhong2, Dequan Wang1,3, Yanfeng Wang1,3, Siheng Chen1,3,4B, 1 Shanghai Jiao Tong University, 2 University of Southern California, 3 Shanghai AI Lab 4 Multi-Agent Governance & Intelligence Crew (MAGIC) 1 {yifan lu, 18671129361, dequanwang, wangyanfeng, sihengc}@sjtu.edu.cn 2 yiqizhon@usc.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks. The methods are described using text and mathematical equations.
Open Source Code Yes We further implement a comprehensive codebase at: https://github.com/yifanlu0227/HEAL.
Open Datasets Yes To evaluate HEAL and further promote open heterogeneous collaborative perception, we propose a large-scale heterogeneous collaborative perception dataset, OPV2V-H, which supplements more sensor types based on the existing OPV2V (Xu et al., 2022c). Extensive experiments on OPV2V-H and real-world dataset DAIR-V2X (Yu et al., 2022) show HEAL s remarkable performance.
Dataset Splits Yes In total, OPV2V-H has 10,524 samples, including 6374/1980/2170 in train/validation/test split, respectively.
Hardware Specification Yes Training costs 5 hours for the collaboration base on 2 RTX 3090 GPUs and 3 hours for each new agent s training, while Xiang et al. (2023) takes more than 1 day to converge with 4 agent types together. Metrics related to the training cost are all measured with batchsize 1 on 1 RTX A40.
Software Dependencies No The paper mentions various models and optimizers (e.g., Adam, ResNeXt, Point Pillars, Lift-Splat), and simulation environments (Open CDA, CARLA), but does not specify software dependencies with version numbers (e.g., PyTorch 1.x, CUDA 11.x).
Experiment Setup Yes The multi-scale feature dimension of Pyramid Fusion is [64,128,256]. The Res Ne Xt layers have [3,5,8] blocks each. Foreground estimators are 1 1 convolution with channel [64,128,256]. The hyper-parameter αℓ= {0.4, 0.2, 0.1}ℓ=1,2,3. We incorporated depth supervision for all camera detections to help convergence. We train the collaboration base and new agent types both for 25 epochs end-to-end with Adam, reducing the learning rate from 0.002 by 0.1 at epochs 15.