Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition

Authors: Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, Colin Raffel

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7. Evaluation, 7.1. Datasets and Evaluation Metrics, Table 1. Sentence-level accuracy and WER for 1000 clean and (imperceptible) adversarially perturbed examples, fed without over-the-air simulation into the Lingvo model., Figure 1. Results of human study for imperceptibility.
Researcher Affiliation Collaboration 1Department of CSE, University of California, San Diego, USA 2Google Brain, USA.
Pseudocode No No explicit pseudocode or algorithm blocks are present in the main body of the paper.
Open Source Code No The project webpage is at http://cseweb.ucsd.edu/ yaq007/imperceptible-robust-adv.html (This links to a general project page, not explicitly stating code availability).
Open Datasets Yes We use the Libri Speech dataset (Panayotov et al., 2015) in our experiments, which is a corpus of 16KHz English speech derived from audiobooks and is used to train the Lingvo system (Shen et al., 2019).
Dataset Splits No We randomly select 1000 audio examples as source examples, and 1000 separate transcriptions from the test-clean dataset to be the targeted transcriptions. This describes data selection for their evaluation, not explicit train/validation/test splits for model training or their attack algorithm development.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies with version numbers were mentioned in the paper.
Experiment Setup Yes We initially set ϵ to a large value and then gradually reduce it during optimization following Carlini & Wagner (2018). and The parameter α that balances the network loss ℓnet(f(x + δ), y) and the imperceptibility loss ℓθ(x, y) is initialized with a small value, e.g., 0.05, and is adaptively updated according to the performance of the attack.