Asynchrony-Robust Collaborative Perception via Bird's Eye View Flow

Authors: Sizhe Wei, Yuxi Wei, Yue Hu, Yifan Lu, Yiqi Zhong, Siheng Chen, Ya Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on both IRV2V and the real-world dataset DAIR-V2X show that Co BEVFlow consistently outperforms other baselines and is robust in extremely asynchronous settings. The code is available at https://github.com/Media Brain-SJTU/Co BEVFlow.
Researcher Affiliation Collaboration 1 Cooperative Medianet Innovation Center, Shanghai Jiao Tong University 2 University of Southern California 3 Shanghai AI Laboratory 1 {sizhewei, wyx3590236732, 18671129361, yifan_lu}@sjtu.edu.cn {sihengc, ya_zhang}@sjtu.edu.cn 2 yiqizhon@usc.edu
Pseudocode No The paper describes the system architecture and processes in text and with equations, but does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes The code is available at https://github.com/Media Brain-SJTU/Co BEVFlow.
Open Datasets Yes To facilitate research on asynchrony for collaborative perception, we simulate the first collaborative perception dataset with different temporal asynchronies based on CARLA [39], named IRregular V2V(IRV2V). ... DAIR-V2X. DAIR-V2X [14] is a real-world collaborative perception dataset.
Dataset Splits Yes We have split the dataset into training, validation, and testing sets, which contain 5,445, 994, and 2,010 samples, respectively.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using Point Pillars[40] and adopting settings from Co Align[17] for the backbone, but does not provide specific version numbers for software dependencies like Python, PyTorch, CUDA, etc.
Experiment Setup Yes We conduct training for a total of 60 epochs, starting with an initial learning rate of 2e-3. Subsequently, at the 10th and 20th epochs, the learning rate decreases to 10% of its previous value. For IRV2V dataset, we set the lidar range as x [ 140.8, +140.8]m, y [ 40, +40]m. The voxel size is h = w = 0.4m. The feature map s size is H = 200, W = 704. For DAIR-V2X dataset, we set the lidar range as x [ 100.8, +100.8]m, y [ 40, +40]m. The voxel size is h = w = 0.4m. The feature map s size is H = 200, W = 504.