Generating Robust Audio Adversarial Examples with Temporal Dependency
Authors: Hongting Zhang, Pan Zhou, Qiben Yan, Xiao-Yang Liu
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results and user study both suggest that the generated adversarial examples can significantly reduce human-perceptible noises and resist the defenses based on the temporal structure. We implement a successful attack on the latest model of an end-to-end CNN based ASR system Wav2letter+ with a differentiable Mel Frequency Cepstral Coefficient (MFCC) features extraction. Experimental results show that the adversarial examples are effective even under temporal dependency based defense (TD defense). User study shows that our adversarial examples have the highest audio quality so far. |
| Researcher Affiliation | Academia | Hongting Zhang1 , Qiben Yan2 , Pan Zhou1 and Xiao-Yang Liu3 1Huazhong University of Science and Technology 2Michigan State University 3Columbia University htzhang@hust.edu.cn, qyan@msu.edu, panzhou@hust.edu.cn, xl2427@columbia.edu |
| Pseudocode | Yes | The pseudocode of the proposed algorithm is presented in Algorithm 1. |
| Open Source Code | No | The paper does not provide any statement about releasing the source code for its described methodology, nor does it include a link to a code repository. |
| Open Datasets | Yes | Libri Speech [Panayotov et al., 2015] is a corpus of approximately 1,000 hours of 16 KHz English speech derived from audiobooks from the Libri Vox project. |
| Dataset Splits | Yes | It comes with its own training, validation sets, test-clean and test-other sets. We use all available samples to train and validate our ASR system. |
| Hardware Specification | Yes | All experiments are carried out on an Ubuntu Server (16.04 LTS) with an Intel Core i5-6500@ 3.20GHz 4, 16G Memory and GTX 1080 GPU. |
| Software Dependencies | No | The paper mentions "we implement Wav2letter+ in Pytorch as our adversarial model." but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | In our experiments, we set the learning rate as 1e 5 in the first stage and 5e 5 in the second stage. To strike a balance between epochs and distortion, we set the width B to 0.2 in all the following experiments to generate adversarial examples with a high quality while reducing runtime cost. |