Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning

Authors: Marlon Tobaben, Hibiki Ito, Joonas Jälkö, Yuan He, Antti Honkela

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyze MIA vulnerability of fine-tuned neural networks both empirically and theoretically, the latter using a simplified model of fine-tuning. We complement the theoretical analysis with extensive experiments over many datasets with varying sizes, in the transfer learning setting for image classification tasks, and observe the same power-law.
Researcher Affiliation	Academia	Marlon Tobaben1 Hibiki Ito2 Joonas Jälkö1 Yuan He1 Antti Honkela1 1Department of Computer Science, University of Helsinki, Finland 2School of Informatics, Kyoto University, Japan EMAIL EMAIL
Pseudocode	No	The paper discusses various algorithms like Li RA and RMIA and their formulations, but it does not present any structured pseudocode or algorithm blocks with numbered steps.
Open Source Code	Yes	We provide the the code for reproducing the experiments in an open repository3. 3https://github.com/DPBayes/impact-dataset-properties-MI-vulnerability-deep-TL
Open Datasets	Yes	We base our experiments on a subset of the few-shot benchmark VTAB (Zhai et al., 2019) that achieves a test classification accuracy > 80% (see Table A2). ... Patch Camelyon (Veeling et al., 2018) ... CIFAR10 (Krizhevsky, 2009) ... Euro SAT (Helber et al., 2019) ... Pets (Parkhi et al., 2012) ... Resics45 (Cheng et al., 2017) ... CIFAR100 (Krizhevsky, 2009) ... We downloaded all datasets from Tensor Flow datasets https://www.tensorflow.org/datasets but Resics45 which required manual download.
Dataset Splits	Yes	Given the input D dataset we perform hyperparameter tuning by splitting the D into 70% train and 30% validation.
Hardware Specification	Yes	All experiments but the R-50 (Fi LM) experiments are run on CPU with 8 cores and 16 GB of host memory. The R-50 (Fi LM) experiments are significantly more expensive and utilise a NVIDIA V100 with 40 GB VRAM, 10 CPU cores and 64 GB of host memory.
Software Dependencies	No	We optimize the hyperparameters (batch size, learning rate and number of epochs) using the Optuna library (Akiba et al., 2019) with the Tree-structured Parzen Estimator (TPE; Bergstra et al., 2011) sampler with 20 iterations (more details in Appendix C.2). We provide the the code for reproducing the experiments in an open repository3. ... This set of hyperparameters is subsequently used to train all shadow models with the Adam optimizer (Kingma and Ba, 2015). While specific libraries and optimizers are mentioned, specific version numbers for these software dependencies are not provided.
Experiment Setup	Yes	Our hyperparameter tuning is heavily inspired by the comprehensive few-shot experiments by Tobaben et al. (2023). We utilise their hyperparameter tuning protocol as it has been proven to yield SOTA results for (DP) few-shot models. ... Table A1: Hyperparameter ranges used for the Bayesian optimization with Optuna. batch size 10 \|D\| clipping norm 0.2 10 epochs 1 200 learning rate 1e-7 1e-2