Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Probabilistic Stability Guarantees for Feature Attributions

Authors: Helen Jin, Anton Xue, Weiqiu You, Surbhi Goel, Eric Wong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate SCA on vision and language tasks and demonstrate the effectiveness of soft stability in measuring the robustness of explanation methods. 5 Experiments We evaluate the advantages of SCA over Mu S, which is currently the only other stability certification algorithm. We also study how stability guarantees vary across vision and language tasks, as well as across explanation methods.
Researcher Affiliation Academia Helen Jin University of Pennsylvania EMAIL Anton Xue University of Texas at Austin EMAIL Weiqiu You University of Pennsylvania EMAIL Surbhi Goel University of Pennsylvania EMAIL Eric Wong University of Pennsylvania EMAIL
Pseudocode Yes Our key insight is that both soft and hard stability can be certified by directly estimating the stability rate through sampling. This leads to a simple algorithm, illustrated in Figure 4 and formalized below: i=1 1 f(x α(i)) = f(x α) , where α(1), . . . , α(N) r(α) are sampled i.i.d. (3) Figure 4: The stability certification algorithm (SCA). Given an explanation α {0, 1}n for a classifier f and input x Rn, we estimate the stability rate τr as follows. First, sample perturbed masks α r(α) uniformly with replacement. Then, compute the empirical stability rate ˆτr, defined as the fraction of samples that preserve the prediction: ˆτr = 1 α 1[f(x α ) = f(x α)]. With a properly chosen sample size N, both hard and soft stability can be certified with statistical guarantees.
Open Source Code Yes Equal contribution. Code is available at: https://github.com/helenjin/soft_stability/
Open Datasets Yes For datasets, we used a 2000-image subset of Image Net (2 images per class) and six subsets of Tweet Eval (emoji, emotion, hate, irony, offensive, sentiment), totaling 10653 samples.
Dataset Splits No For datasets, we used a 2000-image subset of Image Net (2 images per class) and six subsets of Tweet Eval (emoji, emotion, hate, irony, offensive, sentiment), totaling 10653 samples. No specific train/test/validation splits, percentages, or counts are provided.
Hardware Specification Yes We used a cluster with NVIDIA Ge Force RTX 3090 and NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper mentions using models like Vision Transformer (Vi T), Res Net50/18, Ro BERTa, and the exlib implementation for attribution methods, but does not provide specific version numbers for any software components or underlying frameworks.
Experiment Setup Yes Certifying Stability with SCA We used SCA (Equation (3)) for certifying soft stability (Theorem 3.1) with parameters of ε = δ = 0.1, for a sample size of N = 150. We use the same N when certifying hard stability via SCA-hard (Theorem 3.2). We selected the top-25% of features as the explanation. where we used 32 Bernoulli samples to compute smoothing (Definition 4.1). We used 64 Bernoulli samples to compute smoothing (Definition 4.1).