Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Practical Bayes-Optimal Membership Inference Attacks

Authors: Marcus Lassila, Johan Oestman, Khac-Hoang Ngo, Alexandre Graell i Amat

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the attack performance of BASE and G-BASE across a range of datasets and model architectures. This section focuses on attacks against GNNs; results on i.i.d. data (CIFAR-10 and CIFAR-100) are presented in Appendix H.4. Open source code is available to reproduce our results.1 Figure 1: ROC curves of our attack and prior MIAs on the Flickr dataset, averaged over 10 GCN target models.
Researcher Affiliation	Collaboration	Marcus Lassila1 Johan Östman2 Khac-Hoang Ngo3 Alexandre Graell i Amat1 1Chalmers University of Technology 2AI Sweden 3Linköping University
Pseudocode	Yes	Algorithm 1 Shadow Model Training Procedure. 1: Input: Data population G, training algorithm T , and even number of shadow models 2N. 2: Φ 3: for k = 1 to N do 4: Gk Uniform(G), \|Gk\| = 1 2\|G\| 5: Gc k = {z : z G, z / Gk} 6: ϕk T (Gk) 7: ϕc k T (Gc k) 8: Φ Φ {ϕk, ϕc k} 9: end for 10: return Φ
Open Source Code	Yes	Open source code is available to reproduce our results.1 1https://github.com/Marcus Lassila/MIA-audit-GNN
Open Datasets	Yes	Experiments are conducted on 6 graph datasets: Cora, Citeseer, Pubmed, Flickr, Amazon Photo, and Github. Cora, Citeseer and Pubmed are citation networks previously used in node-level MIA work [15 17]. results on i.i.d. data (CIFAR-10 and CIFAR-100) are presented in Appendix H.4.
Dataset Splits	Yes	The models are trained inductively on randomly-induced subgraphs containing 50% of the nodes, and 50% of the dataset is used as target samples, evenly split between members and non-members. To facilitate efficient MIA auditing in the online setting (see Appendix F for a discussion of online vs. offline settings), we adopt the shadow model training procedure proposed in [11] and also used in [13]. Specifically, each shadow model is trained on half of the data population (e.g., half the nodes in a graph dataset), such that each data sample is included in the training set of half of the models.
Hardware Specification	No	The paper currently does not include explicit details about the computational resources required; this will be provided in the supplemental material upon acceptance. The paper introduces a new framework and computational resource details are not crucial for understanding or replicating the main contributions and their impact.
Software Dependencies	No	Optimization is performed using Adam [35]. For each dataset and model, hyperparameters are selected via a grid search, including the learning rate, weight decay, number of training epochs, dropout rate, and dimension of the first GNN layer.
Experiment Setup	Yes	In particular, we search over {0.01, 0.001} for the learning rate, {0.0001, 0.00001} for the weight decay, and {0.0, 0.25, 0.5} for the dropout rate. For the hidden dimension of the first layer, we search in {32, 64, 128, 256, 512}, with 32 or 512 excluded depending on the dataset. The initial search space for the number of epochs is typically {20, 50, 100, 200, 400, 800, 1600}, and is sometimes later refined.