Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps

Authors: Yue Hu, Shaoheng Fang, Zixing Lei, Yiqi Zhong, Siheng Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate Where2comm, we consider 3D object detection in both real-world and simulation scenarios with two modalities (camera/Li DAR) and two agent types (cars/drones) on four datasets: OPV2V, V2X-Sim, DAIR-V2X, and our original Co Perception-UAVs. Where2comm consistently outperforms previous methods; for example, it achieves more than 100, 000 lower communication volume and still outperforms Disco Net and V2X-Vi T on OPV2V. Our code is available at https://github.com/Media Brain-SJTU/where2comm.
Researcher Affiliation Collaboration Yue Hu Shaoheng Fang Zixing Lei Cooperative Medianet Innovation Center, Shanghai Jiao Tong University {18671129361, shfang, chezacarss}@sjtu.edu.cn Yiqi Zhong University of Southern California yiqizhon@usc.edu Siheng Chen Shanghai Jiao Tong University, Shanghai AI Laboratory sihengc@sjtu.edu.cn
Pseudocode Yes also see an algorithmic summary in Algorithm 1 and the optimization-oriented design rationale in Section 7.3 in Appendix.
Open Source Code Yes Our code is available at https://github.com/Media Brain-SJTU/where2comm.
Open Datasets Yes To evaluate Where2comm, we consider 3D object detection in both real-world and simulation scenarios with two modalities (camera/Li DAR) and two agent types (cars/drones) on four datasets: OPV2V, V2X-Sim, DAIR-V2X, and our original Co Perception-UAVs.
Dataset Splits Yes To train the overall system, we supervise two tasks: spatial confidence generation and object detection at each round. As mentioned before, the functionality of the spatial confidence generator is the same as the classification in the detection decoder. To promote parameter efficiency, our spatial confidence generator reuses the parameters of the detection decoder. For the multi-round settings, each round is supervised with one detection loss, the overall loss is L = PK k=0 PN i Ldet b O(k) i , Oi , where Oi is the ith agent s ground-truth objects, Ldet is the detection loss [28]. Training strategy for multi-round setting. To adapt to multi-round communication and dynamic bandwidth, we train the model under various communication settings with curriculum learning strategy [29]. We first gradually increase the communication bandwidth and round; and then, randomly sample bandwidth and round to promote robustness. Through this training strategy, a single model can perform well at various communication conditions.
Hardware Specification No The paper does not specify the hardware used (e.g., GPU models, CPU models, memory). It only mentions experiments covering real-world and simulation scenarios with different agent types and sensors.
Software Dependencies No The paper mentions various frameworks and models used (e.g., CADDN, MotionNet, DVDET, Point Pillar) but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes For camera-only 3D object detection task on OPV2V, we implement the detector following CADDN [31]. The input front-view image size is (416, 160). The front-view input feature map is transformed to BEV with resolution 0.5m/pixel. ... For Li DAR-based 3D object detection task, our detector follows Motion Net [33]. We discretize 3D points into a BEV map with size (256, 256, 13) and the resolution is 0.4m/pixel in length and width, 0.25m in height. ... For the camera-only 3D object detection task on Co Perception-UAVs, our detector follows DVDET [8]. The input aerial image size is (800, 450). The aerial-view input feature map is transformed to BEV with the resolution of 0.25m/pixel, and the size is (192, 352); see more details in Appendix. ... For Li DAR-based 3D object detection task, our detector follows Point Pillar [35]. We represent the field of view into a BEV map with size (200, 504, 64) and the resolution is 0.4m/pixel in length and width.