Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Revisiting Generative Infrared and Visible Image Fusion Based on Human Cognitive Laws

Authors: Lin Guo, Xiaoqing Luo, Wei Xie, Zhancheng Zhang, Hui Li, Rui Wang, Zhenhua Feng, Xiaoning Song

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the proposed method achieves state-of-the-art fusion performance in qualitative and quantitative evaluations across multiple datasets and significantly improves semantic segmentation metrics. This fully demonstrates the advantages of this generative image fusion method, drawing inspiration from human cognition, in enhancing structural consistency and detail quality.
Researcher Affiliation Academia 1School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China 2School of Electronic and Information Engineering Suzhou University of Science and Technology, Suzhou, China EMAIL EMAIL {zczhang}@usts.edu.cn
Pseudocode Yes B Algorithm HCLFuse first applies an optimal-transport-based mapping T to the infrared image X, aligning its distribution with that of the visible image Y and thereby improving the optimization lower bound of the mutual-information objective. The aligned pair (T (X), Y ) is then fed into a multi-scale, mask-regulated variational bottleneck encoder (VBE) to compress and model the latent representation z, so that z captures modality-discriminative and compact features under an unsupervised learning setting. Subsequently, z is refined through a reverse-time diffusion generation process, in which physically guided constraints are dynamically injected at each denoising timestep to regulate the evolution of latent features. Finally, the optimized latent representation z0 is decoded to produce the fused image F. The pseudocode implementations of both the training and inference procedures are provided in Algorithm 1 and Algorithm 2, respectively.
Open Source Code Yes The source code is available at https://github.com/lxq-jnu/HCLFuse
Open Datasets Yes HCLFuse is evaluated on four public datasets: MSRS [30], TNO [31], FMB [21] and MFNet [32], covering diverse conditions such as urban driving, nighttime military scenes, and adverse weather.
Dataset Splits No In the experiments, a subset is sampled to ensure diversity and representative coverage: 361 pairs are selected from MSRS, 42 pairs from TNO, 280 pairs from FMB, and 393 pairs from MFNet. These selected subsets are used to validate the generalization capability of HCLFuse across varying scenes and lighting conditions.
Hardware Specification Yes All experimental evaluations are performed on a computational platform equipped with an NVIDIA Ge Force RTX 3090 GPU and an Intel(R) Core(TM) i7-6850K CPU operating at 3.60 GHz.
Software Dependencies No The paper mentions using the Adam optimizer and the Mask2Former framework but does not specify version numbers for any software libraries or dependencies, such as Python, PyTorch, or CUDA.
Experiment Setup Yes The Adam optimizer with a learning rate of 2 10 5 is used for parameter updates.