Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Pragmatic Heterogeneous Collaborative Perception via Generative Communication Mechanism

Authors: Junfei Zhou, Penglin Dai, Quanmin Wei, Bingyi Liu, Xiao Wu, Jianping Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on the OPV2V-H, DAIR-V2X and V2X-Real datasets demonstrate that Gen Comm outperforms existing state-of-the-art methods, achieving an 81% reduction in both computational cost and parameter count when incorporating new agents. Our code is available at https://github.com/jeffreychou777/Gen Comm. Extensive experiments on the OPV2V-H[12], DAIR-V2X[14] and V2X-Real[15] datasets demonstrate that Gen Comm outperforms state-of-the-art baselines in both simulated and real-world heterogeneous settings.
Researcher Affiliation Academia Junfei Zhou1, 2 Penglin Dai1, 2 Quanmin Wei1, 2 Bingyi Liu3 Xiao Wu1, 2 Jianping Wang4 1Southwest Jiaotong University 2Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, China 3Wuhan University of Technology 4City University of Hong Kong
Pseudocode Yes Here, we present the algorithmic pipeline of the proposed method in Algorithm 1, which summarizes the key steps and provides a clear overview of the overall mechanism. Algorithm 1: Gen Comm: Overall Algorithmic Pipeline
Open Source Code Yes Our code is available at https://github.com/jeffreychou777/Gen Comm.
Open Datasets Yes We conduct experiments on three datasets: OPV2V-H[12], DAIR-V2X[14] and V2X-Real[15]. OPV2V-H is an extension of the large-scale OPV2V[16] dataset... DAIR-V2X is the first real-world dataset... V2X-Real is also a real-world dataset...
Dataset Splits No The paper refers to OPV2V-H[12], DAIR-V2X[14], and V2X-Real[15] datasets, which are standard benchmarks. While the paper describes perception ranges for training and testing, it does not explicitly provide the specific training, validation, and test split percentages or sample counts used for these datasets, nor does it explicitly state that standard splits were used for reproduction.
Hardware Specification Yes All methods, including baselines and Gen Comm, are trained under the same settings for fair comparison on NVIDIA RTX 3090.
Software Dependencies No The paper mentions optimizers like Adam[42] and refers to models and techniques typically implemented in frameworks like PyTorch, but it does not specify explicit version numbers for any software dependencies such as Python, PyTorch, or CUDA versions.
Experiment Setup Yes For all pre-trained model using Att Fuse[16] as the fusion network, we train for 20 epochs with an initial learning rate of 0.002, using the Adam[42] optimizer. The learning rate is decayed by a factor of 0.1 at the 10th and 15th epochs. For pre-trained model using V2X-Vi T[17] as the fusion network, training is conducted for 30 epochs with the same initial learning rate, which is decayed by 0.1 at the 15th and 20th epochs. Based on the pretrained base models, Back Align, MPDA, and Code Filling are fine-tuned for 10 additional epochs with an initial learning rate of 0.001, and the learning rate is decayed by a factor of 0.1 at epoch 5. For STAMP, we follow its training schedule: the model is fine-tuned for 5 epochs with an initial learning rate of 0.01, and the learning rate is decayed by 0.1 at epochs 1, 3, and 4. ... In our collaborative perception setting, the maximum communication range is set to 70 m. During training, the perception range for Li DAR-equipped agents is set to [ 102.4 m, 102.4 m] along the x-axis and [ 51.2 m, 51.2 m] along the y-axis, covering a total area of 204.8 m 102.4 m. In contrast, agents equipped with camera sensors have a limited perception range of [ 51.2 m, 51.2 m] along both axes (i.e., 102.4 m 102.4 m). ... The intermediate feature maps have spatial dimensions of [C, H, W] = [128, 64, 128]... For camera-based agents, ... the feature map size is reduced to [128, 64, 64]. ... We design the transmitted information to have a shape of [C , Hj, Wj], where C = 2 denotes the number of channels. The height Hj and width Wj are determined dynamically based on the receiving agent s spatial configuration. The diffusion model is configured with a total time step T = 3, and the denoising network ϵθ is implemented as a U-Net with 2 layers. Within the Channel Enhancer module, the channel dimensions of Fres and Fconv are both set to 64.