Image Fusion via Vision-Language Model
Authors: Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc Van Gool
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | FILM has shown promising results in four image fusion tasks: infrared-visible, medical, multi-exposure, and multi-focus image fusion. We also propose a vision-language dataset containing Chat GPT-generated paragraph descriptions for the eight image fusion datasets across four fusion tasks, facilitating future research in vision-language model-based image fusion. |
| Researcher Affiliation | Academia | 1Xi an Jiaotong University, China 2ETH Z urich, Switzerland 3Northwestern Polytechnical University, China 4Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China 5Heriot-Watt University, United Kingdom 6KU Leuven, Belgium 7INSAIT, Bulgaria. |
| Pseudocode | No | The paper describes the workflow and components of the FILM algorithm verbally and visually through figures, but it does not include a formally labeled pseudocode block or algorithm. |
| Open Source Code | Yes | Code and dataset are available at https://github. com/Zhaozixiang1228/IF-FILM. |
| Open Datasets | Yes | MSRS (Tang et al., 2022c), M3FD (Liu et al., 2022a) and Road Scene (Xu et al., 2020a) datasets for infrared-visible image fusion (IVF) task, the Harvard medical dataset (Johnson & Becker) for medical image fusion (MIF) task, the Real MFF (Zhang et al., 2020a) and Lytro (Nejati et al., 2015) datasets for multi-focus image fusion (MFF) task, and the SICE (Cai et al., 2018) and MEFB (Zhang, 2021a) datasets for multi-exposure image fusion (MEF) task. ... Code and dataset are available at https://github. com/Zhaozixiang1228/IF-FILM. |
| Dataset Splits | Yes | MSRS dataset: 1083 pairs for IVF training and 361 pairs for IVF testing., Road Scene dataset: 70 pairs for IVF validation, 70 pairs for IVF testing., SICE dataset: 499 pairs for MEF training and 90 pairs MEF testing., Real MFF dataset: 639 pairs for MFF training and 71 pairs for MFF testing. |
| Hardware Specification | Yes | A machine with eight NVIDIA Ge Force RTX 3090 GPUs is utilized for our experiments. |
| Software Dependencies | No | The paper mentions the use of Adam optimizer, Restormer blocks, and specific model architectures, but it does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages. |
| Experiment Setup | Yes | We train the network for 300 epochs using the Adam optimizer, with an initial learning rate of 1e-4 and decreasing by a factor of 0.5 every 50 epochs. The Adam optimization strategy is employed with the batchsize set as 16. We incorporate Restormer blocks (Zamir et al., 2022) in both language-guided vision encoder V( ) and vision feature decoder D( ), with each block having 8 attention heads and a dimensionality of 64. M and N, representing the number of blocks in V( ) and D( ), are set to 2 and 3, respectively. |