Learning Distilled Collaboration Graph for Multi-Agent Perception
Authors: Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng, Wenjun Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our quantitative and qualitative experiments in multi-agent 3D object detection show that Disco Net could not only achieve a better performance-bandwidth trade-off than the state-of-the-art collaborative perception methods, but also bring more straightforward design rationale. |
| Researcher Affiliation | Academia | Yiming Li New York University yimingli@nyu.edu Shunli Ren Shanghai Jiao Tong University renshunli@sjtu.edu.cn Pengxiang Wu Rutgers University pxiangwu@gmail.com Siheng Chen Shanghai Jiao Tong University sihengc@sjtu.edu.cn Chen Feng New York University cfeng@nyu.edu Wenjun Zhang Shanghai Jiao Tong University zhangwenjun@sjtu.edu.cn |
| Pseudocode | No | The paper describes algorithmic steps in paragraph form, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available on https://github.com/ai4ce/Disco Net. |
| Open Datasets | Yes | To validate the proposed method, we build V2X-Sim 1.0, a new large-scale multi-agent 3D object detection dataset in autonomous driving scenarios based on CARLA and SUMO co-simulation platform [6]. V2X-Sim 1.0 dataset is maintained on https://ai4ce.github.io/V2X-Sim/, and the first version of V2X-Sim used in this work includes the Li DAR-based V2V scenario. |
| Dataset Splits | Yes | We use 8,000/900/1,100 frames for training/validation/testing. |
| Hardware Specification | Yes | We train all the models using NVIDIA Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions CARLA and SUMO for dataset synthesis but does not provide specific version numbers for software dependencies used to implement or train the models (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We set the width/length of each voxel as 0.25 meter, and the height as 0.4 meter; therefore the BEV map input to the student/teacher encoder has a dimension of 256 256 13. The hyperparameter λkd is set as 105. We train all the models using NVIDIA Ge Force RTX 3090 GPU. ... each epoch consists of 2,000 iterations. |