Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-Modal Interactive Agent Layer for Few-Shot Universal Cross-Domain Retrieval and Beyond

Authors: Kaixiang Chen, Pengfei Fang, hui xue

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three FS-UCDR benchmarks and eleven few-shot classification datasets demonstrate that MAIL achieves state-of-the-art performance while maintaining superior parameter efficiency and CLIP s inference efficiency.
Researcher Affiliation	Academia	Kaixiang Chen1,2, Pengfei Fang1,2 , Hui Xue1,2 1School of Computer Science and Engineering, Southeast University 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China EMAIL
Pseudocode	Yes	Alg. 1, 2, 3 provides the pseudo-codes for MAIL in a Py Torch-like style.
Open Source Code	No	We will provide the code if the paper is accepted.
Open Datasets	Yes	We conduct experiments on three benchmark datasets: Domain Net [38], Sketchy [41, 32], and TU-Berlin [12, 53]. ... We conduct the base-to-novel generalization and cross-dataset evaluations across 11 diverse image classification datasets: Image Net [9], Caltech101 [14], Oxford Pets [36], Stanford Cars [29], Flowers102 [34], Food101 [2], FGVCAircraft [33], SUN397 [48], UCF101 [35], DTD [8], and Euro SAT [17].
Dataset Splits	Yes	All experiments utilize a 2-shot setting, i.e., only 2 NS training examples per category, where NS denotes the number of source domains. ... In this evaluation, the dataset s categories are partitioned equally into base and novel classes. ... A 16-shot training strategy is employed, where 16 samples per class are randomly selected. ... For base-to-novel evaluation, we adopt a batch size of 64 for the larger datasets (Image Net and SUN397) and 4 for all others.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA RTX 4090 GPU with mixed-precision training to accelerate computation. ... As detailed in Appendix C, a single 24GB GPU is sufficient to reproduce all experiments.
Software Dependencies	No	The paper mentions 'Py Torch-like style' for pseudocode, 'CLIP' as a model, and 'Adam optimizer' but does not specify version numbers for any software libraries or frameworks. Therefore, it does not provide specific ancillary software details.
Experiment Setup	Yes	For Domain Net, training is limited to 1 epoch, whereas Sketchy and TU-Berlin are trained for 20 epochs, with early stopping applied after 2 epochs. ... The optimization is performed using the Adam optimizer with a learning rate of 2e 4 and a cosine decay schedule. All experiments are conducted on a single NVIDIA RTX 4090 GPU with mixed-precision training to accelerate computation. To ensure reproducibility, we follow the setting in Pro S [13] and fix the random seed to 0. A 2-shot training strategy is employed, where two samples per class per domain are randomly selected.