reproducibilityindex.ai

Transferable Perturbations of Deep Feature Distributions

Authors: Nathan Inkawhich, Kevin Liang, Lawrence Carin, Yiran Chen

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Almost all current adversarial attacks of CNN classiﬁers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted blackbox transfer-based attack results for undefended Image Net models. Further, we place a priority on explainability and interpretability of the attacking process. Our methodology affords an analysis of how adversarial attacks change the intermediate feature distributions of CNNs, as well as a measure of layer-wise and class-wise feature distributional separability/entanglement. We also conceptualize a transition from task/data-speciﬁc to model-speciﬁc features within a CNN architecture that directly impacts the transferability of adversarial examples.
Researcher Affiliation	Academia	Nathan Inkawhich, Kevin J Liang, Lawrence Carin & Yiran Chen Department of Electrical and Computer Engineering Duke University {nathan.inkawhich,kevin.liang,lcarin,yiran.chen}@duke.edu
Pseudocode	No	The paper describes the optimization procedure in text (Appendix D) but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include a statement about releasing the code for their method or a direct link to a code repository.
Open Datasets	Yes	Published as a conference paper at ICLR 2020 trained on Image Net-1k (Deng et al., 2009).
Dataset Splits	Yes	We use the Image Net-1K validation set as the test dataset.
Hardware Specification	No	The paper does not specify the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using models from the PyTorch Model Zoo but does not specify version numbers for PyTorch, Python, or other software libraries used for the implementation.
Experiment Setup	Yes	The architecture of all auxiliary models is the same, regardless of model, layer, or class. Each is a 2-hidden layer NN with a single output unit. There are 200 neurons in each hidden layer and the number of input units matches the size of the input feature map. To train the auxiliary models, unbiased batches from the whole Image Net-1k training set are pushed through the truncated pretrained model (fl), and the extracted features are used to train the auxiliary model parameters. ... all targeted adversarial examples are constrained by ℓ ϵ = 16/255 as described in (Dong et al., 2018; Kurakin et al., 2018). As experimentally found, λ = 0.8 in (2) and η = 1e-6 in (3). Finally, as measured over the initially correctly classiﬁed subset of the test dataset (by both the whitebox and blackbox models), attack success is captured in two metrics. Error is the percentage of examples that the blackbox misclassiﬁes and Targeted Success Rate (t Suc) is the percentage of examples that the blackbox misclassiﬁes as the target label.