Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Enhancing Graph Classification Robustness with Singular Pooling

Authors: Sofiane Ennadir, Oleg Smirnov, Yassine ABBAHADDOU, Lele Cao, Johannes F. Lutzeyer

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on real-world benchmarks show that RS-Pool provides better robustness than the considered pooling methods when subject to state-of-the-art adversarial attacks while maintaining competitive clean accuracy. Our code is publicly available at: https://github.com/king/rs-pool. We evaluate RS-Pool against classical flat pooling methods, including Average, Max, and Sum pooling, as well as several advanced approaches. We include Self-Attention Graph pooling (SAG) [27], which uses node features and graph structure to compute attention scores and retain informative nodes, and Top K pooling (Top K-P) [16], which learns to rank nodes via a projection vector and selects a fixed fraction by score. We additionally considered the Bit-Flip Attacks (BFAs) [26, 25] in Appendix E.6. We conduct experiments on standard graph classification datasets from the TUDataset benchmark [32], spanning diverse domains.
Researcher Affiliation	Collaboration	Sofiane Ennadir King AI Labs, Microsoft Gaming EMAIL Oleg Smirnov King AI Labs, Microsoft Gaming EMAIL Yassine Abbahaddou LIX, École Polytechnique, IP Paris EMAIL Lele Cao King AI Labs, Microsoft Gaming EMAIL Johannes F. Lutzeyer LIX, École Polytechnique, IP Paris EMAIL
Pseudocode	Yes	Algorithm 1 RS-Pool Forward Pass Require: H Rn d, τ R>0, K N>0 1: v random unit vector in Rd 2: for t = 1 to K do 3: w Hv 4: v H w 5: v v/ v 2 6: end for 7: return τv
Open Source Code	Yes	Our code is publicly available at: https://github.com/king/rs-pool.
Open Datasets	Yes	We conduct experiments on standard graph classification datasets from the TUDataset benchmark [32], spanning diverse domains. In bioinformatics graphs (PROTEINS, D&D, ENZYMES), small changes to residue contact links can influence biological property predictions. In molecular graphs (NCI1, ER_MD), altering a bond may change the predicted molecular function. For social networks (IMDB-B, REDDIT-B), edge perturbations, such as fake user interactions, can flip the predicted graph label. In image-based graphs (MSRC_9), local edits to image patches can lead to misclassification. For all datasets, we use the public train/validation/test splits when available [14]; otherwise, we adopt the same evaluation protocol and report the specific folds used.
Dataset Splits	Yes	For all datasets, we use the public train/validation/test splits when available [14]; otherwise, we adopt the same evaluation protocol and report the specific folds used. Model evaluation follows the standardized protocol of [14], using 10-fold cross-validation. When public folds are available, we use them directly; otherwise, we generate new folds following the same procedure.
Hardware Specification	Yes	The experiments have been run on a NVIDIA A40 GPU and we estimate the total number of hours of computing to be around 200 hours.
Software Dependencies	No	The code is developed using Py Torch [33], with a dense implementation of GNNs, which is required for executing the considered adversarial attacks. For all baseline pooling methods, we adapt the official implementations from Py Torch Geometric (Py G) [15], released under the MIT license.
Experiment Setup	Yes	All models are trained using the Adam optimizer [23] with a learning rate of 1e-3 for 100 epochs. We set the hidden feature dimension to 32 and we used ReLU as our activation function for all the models. For all the experiments, we used the same initialization distribution and the same number of training epochs to ensure fairness following insights from previous work[12]. Additionally, to account for variability due to random initialization, each experiment is repeated 10 times, and we report the mean and standard deviation of the results.