Learning Distilled Collaboration Graph for Multi-Agent Perception

Authors: Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, Wenjun Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our quantitative and qualitative experiments in multi-agent 3D object detection show that Disco Net could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale.
Researcher Affiliation Academia Yiming Li New York University yimingli@nyu.edu Shunli Ren Shanghai Jiao Tong University renshunli@sjtu.edu.cn Pengxiang Wu Rutgers University pxiangwu@gmail.com Siheng Chen Shanghai Jiao Tong University sihengc@sjtu.edu.cn Chen Feng New York University cfeng@nyu.edu Wenjun Zhang Shanghai Jiao Tong University zhangwenjun@sjtu.edu.cn
Pseudocode No The paper describes algorithmic steps in paragraph form, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available on https://github.com/ai4ce/Disco Net.
Open Datasets Yes To validate the proposed method, we build V2X-Sim 1.0, a new large-scale multi-agent 3D object detection dataset in autonomous driving scenarios based on CARLA and SUMO co-simulation platform [6]. V2X-Sim 1.0 dataset is maintained on https://ai4ce.github.io/V2X-Sim/, and the first version of V2X-Sim used in this work includes the Li DAR-based V2V scenario.
Dataset Splits Yes We use 8,000/900/1,100 frames for training/validation/testing.
Hardware Specification Yes We train all the models using NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies No The paper mentions CARLA and SUMO for dataset synthesis but does not provide specific version numbers for software dependencies used to implement or train the models (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We set the width/length of each voxel as 0.25 meter, and the height as 0.4 meter; therefore the BEV map input to the student/teacher encoder has a dimension of 256 256 13. The hyperparameter λkd is set as 105. We train all the models using NVIDIA Ge Force RTX 3090 GPU. ... each epoch consists of 2,000 iterations.