Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MixPrompt: Efficient Mixed Prompting for Multimodal Semantic Segmentation

Authors: Zhiwei Hao, Zhongyu Xiao, Jianyuan Guo, Li Shen, Yong Luo, Han Hu, Dan Zeng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across NYU Depth V2, SUN-RGBD, MFNet, and DELIVER datasets show that Mix Prompt achieves improvements of 4.3, 1.1, 0.4, and 1.1 m Io U, respectively, over two-branch baselines, while using nearly half the parameters.
Researcher Affiliation	Academia	Zhiwei Hao1 , Zhongyu Xiao1 , Jianyuan Guo2 , Li Shen3, Yong Luo4, Han Hu1 , Dan Zeng5 1School of information and Electronics, Beijing Institute of Technology. 2Department of Computer Science, City University of Hong Kong. 3School of Cyber Science and Technology, Sun Yat-sen University. 4School of Computer Science, Wuhan University. 5School of Communication and Information Engineering, Shanghai University.
Pseudocode	Yes	The detailed algorithmic procedure for the mixed prompting module is presented in Algorithm 1. Algorithm 1 Mixed Prompting Module
Open Source Code	Yes	The code is available at https://github.com/xiaoshideta/Mix Prompt.
Open Datasets	Yes	We evaluate Mix Prompt on four benchmark datasets: NYU Depth V2 [15], SUN-RGBD [19], MFNet [10], and DELIVER [8].
Dataset Splits	Yes	DELIVER [8] comprises 3,983 training and 2,005 testing samples with RGB, Depth, Event, and Lidar modalities across 25 categories.
Hardware Specification	Yes	All experiments are conducted on NVIDIA GeForce RTX 3090 GPUs.
Software Dependencies	No	For the NYU Depth V2 dataset, we use SGD [56] with a weight decay of 5 10 4 and an initial learning rate of 0.04. The model is trained for 500 epochs with a batch size of 8. For the SUN-RGBD dataset, we adopt the Adam W optimizer [57] with 100 epochs and an initial learning rate of 0.005. For the MFNet dataset, we train for 500 epochs with the Adam W optimizer, a learning rate of 6 10 4, and a batch size of 4.
Experiment Setup	Yes	Table 10: Training configurations for different datasets. LR: Learning Rate, WD: Weight Decay, Mom: Momentum. Dataset Input Size Batch Size Epochs Optimizer LR WD Mom NYUD-v2 480 640 8 500 SGD 4e-2 0.0005 0.9 SUN-RGBD 530 730 4 100 Adam W 5e-3 0.01 (0.9, 0.999) MFNet 480 640 4 500 Adam W 6e-4 0.01 (0.9, 0.999) DELIVER 1024 1024 2 200 Adam W 6e-5 0.01 (0.9, 0.999)