Generating Natural Adversarial Examples
Authors: Zhengli Zhao, Dheeru Dua, Sameer Singh
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers. We apply our approach to both image and text domains, and generate adversaries that are more natural and grammatical, semantically close to the input, and helpful to interpret the local behavior of black-box models. Experiments and human evaluation also demonstrate that our approach can help evaluate the robustness of black-box classifiers, even without labeled training data. |
| Researcher Affiliation | Academia | Zhengli Zhao University of California Irvine, CA 92697, USA zhengliz@uci.edu Dheeru Dua University of California Irvine, CA 92697, USA ddua@uci.edu Sameer Singh University of California Irvine, CA 92697, USA sameer@uci.edu |
| Pseudocode | Yes | Appendix B is titled 'ALGORITHMS' and contains 'Algorithm 1 Iterative stochastic search in latent space for adversaries' and 'Algorithm 2 Hybrid shrinking search in latent space for adversaries'. |
| Open Source Code | Yes | Code used to generate such natural adversaries is available at https://github.com/zhengliz/natural-adversary. |
| Open Datasets | Yes | We apply our approach to two standard datasets, MNIST and LSUN... We train our framework on the Stanford Natural Language Inference (SNLI) (Bowman et al., 2015) data of 570k labeled human-written English sentence pairs with the same preprocessing as Zhao et al. (2017). |
| Dataset Splits | No | The paper mentions using 'test images' and 'test sentences' for evaluation but does not provide specific details on the train/validation/test splits, such as percentages or counts for each partition, nor does it refer to a standard validation split. It mentions total dataset sizes (e.g., '60,000 MNIST images', '570k labeled human-written English sentence pairs') and uses test sets, but lacks information on validation sets or explicit splitting methodology for all three partitions. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper refers to software components like 'WGAN', 'ARAE', 'LSTM', 'CNN', but does not provide specific version numbers for any of these software dependencies, which would be necessary for reproducibility. |
| Experiment Setup | Yes | We use r = 0.01 and N = 5000 with model details in Appendix C. We train a WGAN with z IR64... with a generator consisting of 3 transposed convolutional layers and Re LU activation, and a critic consisting of 3 convolutional layers with filter sizes (64, 128, 256) and strides (2, 2, 2). We include an inverter with 2 fully connected layers of dimensions (4096, 1024). For Church Outdoor and Tower images from LSUN dataset, we follow similar procedures as in Gulrajani et al. (2017) training a WGAN of latent z IR128. The generator and critic are both residual networks. We use pre-activation residual blocks with two 3 3 convolutional layers each and Re LU activation. The critic of 4 residual blocks performs downsampling using mean pooling after the second convolution, while the generator contains 4 residual blocks performing nearest-neighbor upsampling before the second convolution. We include an inverter with 3 fully connected layers of dimensions (8192, 2048, 512) on top of the critic s last hidden layer. |