Characterizing Audio Adversarial Examples Using Temporal Dependency
Authors: Zhuolin Yang, Bo Li, Pin-Yu Chen, Dawn Song
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Tested on the automatic speech recognition (ASR) tasks and three recent audio adversarial attacks, we find that (i) input transformation developed from image adversarial defense provides limited robustness improvement and is subtle to advanced attacks; (ii) temporal dependency can be exploited to gain discriminative power against audio adversarial examples and is resistant to adaptive attacks considered in our experiments. |
| Researcher Affiliation | Collaboration | Zhuolin Yang Shanghai Jiao Tong University Bo Li University of Illinois at Urbana Champaign Pin-Yu Chen IBM Research Dawn Song UC, Berkeley |
| Pseudocode | No | The paper describes its methods verbally and with figures, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code. |
| Open Source Code | No | The paper does not provide any statements about releasing code for its methodology, nor does it include links to a source code repository. |
| Open Datasets | Yes | Libri Speech dataset: Libri Speech (Panayotov et al., 2015) is a corpus of approximately 1000 hours of 16Khz English speech derived from audiobooks from the Libri Vox project. [...] Mozilla Common Voice dataset: Common Voice is a large audio dataset provided by Mozilla. This dataset is public and contains samples from human speaking audio files. We used the 16Khzsampled data released in (Carlini & Wagner, 2018), whose average duration is 3.998s. [...] Speech Commands dataset: Speech Commands dataset (Warden, 2018) is an audio dataset contains 65000 audio files. |
| Dataset Splits | No | The paper mentions using specific test subsets of datasets (e.g., 'test-clean dataset' from Libri Speech, 'first 100 samples from its test dataset' from Common Voice) and indicates models like Deep Speech are used, but it does not provide explicit training, validation, and test splits (percentages or counts) for their experimental setup or for how the models were trained. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing specifications. |
| Software Dependencies | No | The paper mentions software like 'Kaldi speech recognition platform' and 'Deep Speech speech-to-text transcription network' but does not provide specific version numbers for these or any other software dependencies, libraries, or programming languages used. |
| Experiment Setup | Yes | We implemented their attack with 500 iterations and limit the magnitude of adversarial perturbation within 5 and generated 50 adversarial examples per attack task (more targets are shown in the supplementary material). |