Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

Authors: Jon Vadillo, Roberto Santana, Jose A. Lozano

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also experimentally validate our approach for the spoken command classiﬁcation task and the Tweet emotion classiﬁcation task, two exemplary machine learning problems in the audio and text domain, respectively. Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate and even prevent the attacks from being detected by label-shift detection methods.
Researcher Affiliation	Academia	Jon Vadillo EMAIL Department of Computer Science and Artiﬁcial Intelligence University of the Basque Country UPV/EHU 20018 Donostia, Spain. Roberto Santana EMAIL Department of Computer Science and Artiﬁcial Intelligence University of the Basque Country UPV/EHU 20018 Donostia, Spain. Jose A. Lozano EMAIL Department of Computer Science and Artiﬁcial Intelligence University of the Basque Country UPV/EHU 20018 Donostia, Spain. Basque Center for Applied Mathematics (BCAM) 48009 Bilbao, Spain.
Pseudocode	Yes	Algorithm 1 Generating adversarial class probability distributions.
Open Source Code	Yes	Our code is publicly available at: https://github.com/vadel/ACPD. Our code is available at: https://github.com/vadel/ACPD (see Appendix D for further details).
Open Datasets	Yes	We use the Speech Command Dataset (Warden, 2018), which consists of a set of WAV audio ﬁles of 30 diﬀerent spoken commands. We selected the Emotion dataset proposed in Saravia et al. (2018), which contains Tweets categorized in 6 emotions: sadness, joy, love, anger, fear and surprise.
Dataset Splits	Yes	The dataset contains 46.258 samples, accounting for approximately 13 hours of data, and it is split into training (80%), validation (10%) and test (10%) sets, following the standard partition procedure proposed in Warden (2018). To thoroughly evaluate our methods, we randomly sampled a set X of 1000 inputs per class from the training set of the Speech Command Dataset, and computed a 2-fold cross-validation, using one half of X as X and the other half as ˆ X. Moreover, we launched 50 repetitions of the cross-validation process, using in every repetition a diﬀerent random partition of X.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for running experiments. The paper generally refers to
Software Dependencies	No	The linear programs are solved using the Python Pu LP library and the Coin-or Branch and Cut (CBC) solver. The Foolbox package (Rauber et al., 2018) and the Open Attack package (Zeng et al., 2021) were also used. The required Python packages can be consulted in the setup/ directory. However, specific version numbers for these software components are not provided in the main text.
Experiment Setup	Yes	For the AM and the UBM, an upper bound of ξ = 0.01 will be set for the values in L. For the AM, the UBM and the EWTM, γ1 = γ2 = 1 will be set, as well as γ3 = 10 for the UBM, in order to avoid the relaxation of the upper-bounds r i,j. The results will be computed under the following maximum distortion thresholds: ϵ {0.0005, 0.001, 0.0025, 0.005, 0.01, 0.05, 0.1, 0.15}. The Deep Fool algorithm was restricted to a maximum of 30 iterations in our experiments. In both cases [FGSM and PGD], a targeted formulation can be obtained by considering the loss with respect to the target class yt, L(x, yt), and perturbing x in the opposite direction of the gradients, that is, sign( Lx(x, y)). The PGD algorithm was restricted to a maximum of 30 iterations. The attack [C&W] was restricted to a maximum of 1000 optimization steps. The parameter κ in Equation (21) controls the desired conﬁdence in the incorrect class yt, and the constant c in Equation (20) balances the trade-oﬀ between the perturbation norm and the conﬁdence in the incorrect class. In this paper, κ is set to 0 and a binary search is used to tune the parameter c for every input.