Enabling Fast and Universal Audio Adversarial Attack Using Generative Model
Authors: Yi Xie, Zhuohang Li, Cong Shi, Jian Liu, Yingying Chen, Bo Yuan14129-14137
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on DNN-based audio systems show that our proposed FAPG can achieve high success rate with up to 214 speedup over the existing audio adversarial attack methods. |
| Researcher Affiliation | Academia | 1Rutgers University 2The University of Tennessee, Knoxville |
| Pseudocode | Yes | Algorithm 1: Training Procedure of FAPG; Algorithm 2: Training Procedure of UAPG |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code for the methodology described in this paper. The link 'https://kaldi-asr.org/models/m3' refers to a third-party model used, not the authors' own implementation. |
| Open Datasets | Yes | Google Speech Commands dataset (Warden 2018), speaker recognition model on VCTK dataset (Christophe, Junichi, and Kirsten 2016) and environmental sound classification model on Urban Sound8k dataset (Salamon, Jacoby, and Bello 2014). |
| Dataset Splits | Yes | The dataset is split into training, validation and testing set with a ratio of 8 : 1 : 1. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'Wave-U-Net', but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | A total of 10,000 training steps are conducted using Adam optimizer with the batch size of 100. The initial learning rate is set to 1e 4 and gradually decayed to 1e 6. β is set as 0.1 for all dataset. τ is initially set as 0.1 and reduces to 0.05 and 0.03 at step of 3,000 and 7,000 for command recognition and speaker recognition, and it stops reducing as 0.05 for sound classification model. |