Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism
Authors: Junfei Zhou, Penglin Dai, Quanmin Wei, Bingyi Liu, Xiao Wu, Jianping Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on the OPV2V-H, DAIR-V2X and V2X-Real datasets demonstrate that Gen Comm outperforms existing state-of-the-art methods, achieving an 81% reduction in both computational cost and parameter count when incorporating new agents. Our code is available at https://github.com/jeffreychou777/Gen Comm. Extensive experiments on the OPV2V-H[12], DAIR-V2X[14] and V2X-Real[15] datasets demonstrate that Gen Comm outperforms state-of-the-art baselines in both simulated and real-world heterogeneous settings. |
| Researcher Affiliation | Academia | Junfei Zhou1, 2 Penglin Dai1, 2 Quanmin Wei1, 2 Bingyi Liu3 Xiao Wu1, 2 Jianping Wang4 1Southwest Jiaotong University 2Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China 3Wuhan University of Technology 4City University of Hong Kong |
| Pseudocode | Yes | Here, we present the algorithmic pipeline of the proposed method in Algorithm 1, which summarizes the key steps and provides a clear overview of the overall mechanism. Algorithm 1: Gen Comm: Overall Algorithmic Pipeline |
| Open Source Code | Yes | Our code is available at https://github.com/jeffreychou777/Gen Comm. |
| Open Datasets | Yes | We conduct experiments on three datasets: OPV2V-H[12], DAIR-V2X[14] and V2X-Real[15]. OPV2V-H is an extension of the large-scale OPV2V[16] dataset... DAIR-V2X is the first real-world dataset... V2X-Real is also a real-world dataset... |
| Dataset Splits | No | The paper refers to OPV2V-H[12], DAIR-V2X[14], and V2X-Real[15] datasets, which are standard benchmarks. While the paper describes perception ranges for training and testing, it does not explicitly provide the specific training, validation, and test split percentages or sample counts used for these datasets, nor does it explicitly state that standard splits were used for reproduction. |
| Hardware Specification | Yes | All methods, including baselines and Gen Comm, are trained under the same settings for fair comparison on NVIDIA RTX 3090. |
| Software Dependencies | No | The paper mentions optimizers like Adam[42] and refers to models and techniques typically implemented in frameworks like PyTorch, but it does not specify explicit version numbers for any software dependencies such as Python, PyTorch, or CUDA versions. |
| Experiment Setup | Yes | For all pre-trained model using Att Fuse[16] as the fusion network, we train for 20 epochs with an initial learning rate of 0.002, using the Adam[42] optimizer. The learning rate is decayed by a factor of 0.1 at the 10th and 15th epochs. For pre-trained model using V2X-Vi T[17] as the fusion network, training is conducted for 30 epochs with the same initial learning rate, which is decayed by 0.1 at the 15th and 20th epochs. Based on the pretrained base models, Back Align, MPDA, and Code Filling are fine-tuned for 10 additional epochs with an initial learning rate of 0.001, and the learning rate is decayed by a factor of 0.1 at epoch 5. For STAMP, we follow its training schedule: the model is fine-tuned for 5 epochs with an initial learning rate of 0.01, and the learning rate is decayed by 0.1 at epochs 1, 3, and 4. ... In our collaborative perception setting, the maximum communication range is set to 70 m. During training, the perception range for Li DAR-equipped agents is set to [ 102.4 m, 102.4 m] along the x-axis and [ 51.2 m, 51.2 m] along the y-axis, covering a total area of 204.8 m 102.4 m. In contrast, agents equipped with camera sensors have a limited perception range of [ 51.2 m, 51.2 m] along both axes (i.e., 102.4 m 102.4 m). ... The intermediate feature maps have spatial dimensions of [C, H, W] = [128, 64, 128]... For camera-based agents, ... the feature map size is reduced to [128, 64, 64]. ... We design the transmitted information to have a shape of [C , Hj, Wj], where C = 2 denotes the number of channels. The height Hj and width Wj are determined dynamically based on the receiving agent s spatial configuration. The diffusion model is configured with a total time step T = 3, and the denoising network ϵθ is implemented as a U-Net with 2 layers. Within the Channel Enhancer module, the channel dimensions of Fres and Fconv are both set to 64. |