Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Extending Adversarial Attacks to Produce Adversarial Class Probability Distributions

Authors: Jon Vadillo, Roberto Santana, Jose A. Lozano

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also experimentally validate our approach for the spoken command classification task and the Tweet emotion classification task, two exemplary machine learning problems in the audio and text domain, respectively. Our results demonstrate that we can closely approximate any probability distribution for the classes while maintaining a high fooling rate and even prevent the attacks from being detected by label-shift detection methods.
Researcher Affiliation Academia Jon Vadillo EMAIL Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU 20018 Donostia, Spain. Roberto Santana EMAIL Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU 20018 Donostia, Spain. Jose A. Lozano EMAIL Department of Computer Science and Artificial Intelligence University of the Basque Country UPV/EHU 20018 Donostia, Spain. Basque Center for Applied Mathematics (BCAM) 48009 Bilbao, Spain.
Pseudocode Yes Algorithm 1 Generating adversarial class probability distributions.
Open Source Code Yes Our code is publicly available at: https://github.com/vadel/ACPD. Our code is available at: https://github.com/vadel/ACPD (see Appendix D for further details).
Open Datasets Yes We use the Speech Command Dataset (Warden, 2018), which consists of a set of WAV audio files of 30 different spoken commands. We selected the Emotion dataset proposed in Saravia et al. (2018), which contains Tweets categorized in 6 emotions: sadness, joy, love, anger, fear and surprise.
Dataset Splits Yes The dataset contains 46.258 samples, accounting for approximately 13 hours of data, and it is split into training (80%), validation (10%) and test (10%) sets, following the standard partition procedure proposed in Warden (2018). To thoroughly evaluate our methods, we randomly sampled a set X of 1000 inputs per class from the training set of the Speech Command Dataset, and computed a 2-fold cross-validation, using one half of X as X and the other half as ˆ X. Moreover, we launched 50 repetitions of the cross-validation process, using in every repetition a different random partition of X.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for running experiments. The paper generally refers to
Software Dependencies No The linear programs are solved using the Python Pu LP library and the Coin-or Branch and Cut (CBC) solver. The Foolbox package (Rauber et al., 2018) and the Open Attack package (Zeng et al., 2021) were also used. The required Python packages can be consulted in the setup/ directory. However, specific version numbers for these software components are not provided in the main text.
Experiment Setup Yes For the AM and the UBM, an upper bound of ξ = 0.01 will be set for the values in L. For the AM, the UBM and the EWTM, γ1 = γ2 = 1 will be set, as well as γ3 = 10 for the UBM, in order to avoid the relaxation of the upper-bounds r i,j. The results will be computed under the following maximum distortion thresholds: ϵ {0.0005, 0.001, 0.0025, 0.005, 0.01, 0.05, 0.1, 0.15}. The Deep Fool algorithm was restricted to a maximum of 30 iterations in our experiments. In both cases [FGSM and PGD], a targeted formulation can be obtained by considering the loss with respect to the target class yt, L(x, yt), and perturbing x in the opposite direction of the gradients, that is, sign( Lx(x, y)). The PGD algorithm was restricted to a maximum of 30 iterations. The attack [C&W] was restricted to a maximum of 1000 optimization steps. The parameter κ in Equation (21) controls the desired confidence in the incorrect class yt, and the constant c in Equation (20) balances the trade-off between the perturbation norm and the confidence in the incorrect class. In this paper, κ is set to 0 and a binary search is used to tune the parameter c for every input.