reproducibilityindex.ai

Image Fusion via Vision-Language Model

Authors: Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc Van Gool

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	FILM has shown promising results in four image fusion tasks: infrared-visible, medical, multi-exposure, and multi-focus image fusion. We also propose a vision-language dataset containing Chat GPT-generated paragraph descriptions for the eight image fusion datasets across four fusion tasks, facilitating future research in vision-language model-based image fusion.
Researcher Affiliation	Academia	1Xi an Jiaotong University, China 2ETH Z urich, Switzerland 3Northwestern Polytechnical University, China 4Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, China 5Heriot-Watt University, United Kingdom 6KU Leuven, Belgium 7INSAIT, Bulgaria.
Pseudocode	No	The paper describes the workflow and components of the FILM algorithm verbally and visually through figures, but it does not include a formally labeled pseudocode block or algorithm.
Open Source Code	Yes	Code and dataset are available at https://github. com/Zhaozixiang1228/IF-FILM.
Open Datasets	Yes	MSRS (Tang et al., 2022c), M3FD (Liu et al., 2022a) and Road Scene (Xu et al., 2020a) datasets for infrared-visible image fusion (IVF) task, the Harvard medical dataset (Johnson & Becker) for medical image fusion (MIF) task, the Real MFF (Zhang et al., 2020a) and Lytro (Nejati et al., 2015) datasets for multi-focus image fusion (MFF) task, and the SICE (Cai et al., 2018) and MEFB (Zhang, 2021a) datasets for multi-exposure image fusion (MEF) task. ... Code and dataset are available at https://github. com/Zhaozixiang1228/IF-FILM.
Dataset Splits	Yes	MSRS dataset: 1083 pairs for IVF training and 361 pairs for IVF testing., Road Scene dataset: 70 pairs for IVF validation, 70 pairs for IVF testing., SICE dataset: 499 pairs for MEF training and 90 pairs MEF testing., Real MFF dataset: 639 pairs for MFF training and 71 pairs for MFF testing.
Hardware Specification	Yes	A machine with eight NVIDIA Ge Force RTX 3090 GPUs is utilized for our experiments.
Software Dependencies	No	The paper mentions the use of Adam optimizer, Restormer blocks, and specific model architectures, but it does not provide specific version numbers for software dependencies such as deep learning frameworks (e.g., PyTorch, TensorFlow) or programming languages.
Experiment Setup	Yes	We train the network for 300 epochs using the Adam optimizer, with an initial learning rate of 1e-4 and decreasing by a factor of 0.5 every 50 epochs. The Adam optimization strategy is employed with the batchsize set as 16. We incorporate Restormer blocks (Zamir et al., 2022) in both language-guided vision encoder V( ) and vision feature decoder D( ), with each block having 8 attention heads and a dimensionality of 64. M and N, representing the number of blocks in V( ) and D( ), are set to 2 and 3, respectively.