Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Privacy-Preserving Personalized Federated Prompt Learning for Multimodal Large Language Models

Authors: Linh Tran, Wei Sun, Stacy Patterson, Ana Milanova

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the effectiveness of our approach over other benchmarks. 1 INTRODUCTION ... 4 EXPERIMENTS ... 4.2 PERFORMANCE RESULTS ... 4.3 ABLATION STUDY
Researcher Affiliation	Collaboration	Linh Tran1 Wei Sun2 Stacy Patterson1 Ana Milanova1 1Rensselaer Polytechnic Institute 2 IBM Research
Pseudocode	Yes	Algorithm 1 DP-FPL
Open Source Code	No	The paper does not provide an explicit statement about releasing their own source code, nor does it provide a direct link to a code repository for the methodology described in the paper. It only links to a third-party resource for data splitting.
Open Datasets	Yes	Datasets. We select four visual classification datasets to investigate the task of balancing personalization, generalization and privacy: Caltech101 (Fei-Fei et al., 2004), Oxford Pets (Parkhi et al., 2012), Oxford Flowers (Nilsback & Zisserman, 2008) and Food101 (Bossard et al., 2014). ... The dataset is available for download on http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_Object Categories.tar.gz. ... can be downloaded at https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz. ... can be retrieved at https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz. ... are available for download at https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/. ... CIFAR-100 (Krizhevsky et al., 2009) is a another large-scale dataset containing images of 100 different object classes. The dataset is are available for download via torchvision.datasets.CIFAR10.
Dataset Splits	Yes	We utilize the pathological data split among 10 clients. ... The dataset contains 6, 593 samples, including 4, 128 training samples and 2, 465 testing samples. ... The dataset is divided into training set of 2, 944 images and test set of 3, 669 images. ... There are 6, 556 total images, including 4, 093 training images and 2, 463 testing images. ... we split the dataset into train set of size 50, 500 and test set of size 30, 300. ... The dataset consists of 60, 000 32x32 images, with 6, 000 images per class. We divide the dataset into training set of 50, 000 images and test set of 10, 000 images.
Hardware Specification	Yes	We run our experiment on a computer cluster, each node has a 6x NVIDIA Tesla V100 GPUs with 32 Gi B of memory and 512 Gi B RAM and 2x IBM Power 9 processors.
Software Dependencies	No	The paper mentions software components like Vision Transformer Vi T-B16, Res Net50, CLIP, and SGD optimizer, but does not specify their version numbers or any programming language versions or specific library versions like Python or PyTorch.
Experiment Setup	Yes	For each dataset, we run experiments with N = 10 clients for T = 100 global training rounds. We use batch size \|B\| = 32 for training and \|B\| = 100 for testing. We set the global learning rate ηG = 0.0001 and local learning rate ηL = 0.0001 with SGD optimizer. ... For CIFAR-100, we adopt Res Net50 (He et al., 2016) as the backbone model and run experiments with N = 25 and N = 50 clients for T = 200 global training rounds. We use batch size \|B\| = 32 for training and \|B\| = 100 for testing. We set the global learning rate ηG = 0.0001 and local learning rate ηL = 0.0001 with SGD optimizer. ... For the prompt learner, the length of prompt vectors is b = 16 with a dimension of d = 512, and the token position is end with random initialization. For the factorization process, we experiment with four different factorization rank 1, 2, 4, 8. We consider three different DP noises with privacy level from low to high: ϵ = {0.4, 0.2, 0.1}. The clipping threshold is chosen to be Cth = 10 for both GDP and LDP applications.