Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FOCoOp: Enhancing Out-of-Distribution Robustness in Federated Prompt Learning for Vision-Language Models

Authors: Xinting Liao, Weiming Liu, Jiaming Qian, Pengyang Zhou, Jiahe Xu, Wenjie Wang, Chaochao Chen, Xiaolin Zheng, Tat-Seng Chua

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The paper includes a dedicated "Experiments" section (Section 4) and subsections like "Experimental Setup", "Performance Evaluation", and "Ablation Studies". It evaluates performance metrics (ACC, CACC, FPR95, AUROC) on multiple real-world datasets (e.g., CIFAR-100, Tiny Image Net, Food101, Domain Net) and makes comparisons with various baselines. This clearly indicates empirical studies with data analysis.
Researcher Affiliation	Academia	All listed affiliations, 'Zhejiang University', 'University of Science and Technology of China', and 'National University of Singapore', are academic institutions. The correspondence email provided 'EMAIL' also belongs to an academic domain (.edu.cn).
Pseudocode	Yes	The paper contains a section B titled "Algorithm" which includes "Algorithm 1 Training procedure of FOCo Op", "Algorithm 2 Training procedure of Bi-level Distributional Robustness Optimization", and "Algorithm 3 Training procedure of Semi-unbalanced optimal transport based prompt calibration".
Open Source Code	Yes	The abstract states: "The project is available at Git Hub."
Open Datasets	Yes	The paper explicitly lists and cites numerous publicly available datasets in Section 4.1 "Experimental Setup" and Appendix D "Datasets and Implementation Details", including: CIFAR-100 (Krizhevsky et al., 2009), Tiny Image Net (Le & Yang, 2015), CIFAR-100-C (Hendrycks & Dietterich, 2018), i Naturalist (Van Horn et al., 2018), i SUN (Xiao et al., 2010), Places (Zhou et al., 2017), Texture (Cimpoi et al., 2014b), Food101 (Bossard et al., 2014), DTD (Cimpoi et al., 2014a), Caltech101 (Fei-Fei et al., 2004), Flowers (Nilsback & Zisserman, 2008), Oxford Pet (Parkhi et al., 2012), Domain Net (Peng et al., 2019), and Office-Caltech10 (Gong et al., 2012).
Dataset Splits	Yes	The paper specifies dataset splits and strategies: "We simulate heterogeneous distribution following both Dirichlet and Pathlogical settings (Mc Mahan et al., 2017; Li et al., 2020) on CIFAR-100 (Krizhevsky et al., 2009) and Tiny Image Net (Le & Yang, 2015)". It also mentions client distribution: "we set local training epoch E = 2, communication round T = 25, and the number of clients K = 10 for fully participation. While in cross-device setting, we choose local training epochs E = 2, communication rounds T = 100 , and K = 100 for 10% participation." Furthermore, for domain generalization, it states, "by leave-one-domain-out validation strategy (Nguyen et al., 2022b). Specifically, for N 1 domains of one dataset, we train each client with distinct domain data, and test its model generalization on the whole target data of remaining one domain."
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models (e.g., NVIDIA A100, RTX 2080 Ti), CPU models, or cloud computing instance specifications used for running the experiments. It only mentions using 'Vi T-B/16 CLIP models' and 'Res Net50' as model architectures.
Software Dependencies	No	The paper mentions using 'Vi T-B/16 CLIP models' and 'Res Net50' as part of their methodology, but it does not specify any software dependencies with version numbers, such as Python, PyTorch, TensorFlow, or CUDA versions.
Experiment Setup	Yes	The paper provides specific experimental setup details in Section 4.1 "Implementation Details and Evaluation Metrics": "We set the learnable prompt vectors with length as 16, embedding size as 512, class token position as end , and random initialization. We choose 1 prompt per class for both local and global ID prompts, and 100 OOD prompts in total." It also specifies training parameters: "we set local training epoch E = 2, communication round T = 25, and the number of clients K = 10 for fully participation. While in cross-device setting, we choose local training epochs E = 2, communication rounds T = 100 , and K = 100 for 10% participation." Furthermore, Section 4.2 and Figure 7 discuss sensitivity analyses for hyperparameters like ρ, α, τ1, and τ2, indicating their specific values were used.