reproducibilityindex.ai

Characterizing Audio Adversarial Examples Using Temporal Dependency

Authors: Zhuolin Yang, Bo Li, Pin-Yu Chen, Dawn Song

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Tested on the automatic speech recognition (ASR) tasks and three recent audio adversarial attacks, we ﬁnd that (i) input transformation developed from image adversarial defense provides limited robustness improvement and is subtle to advanced attacks; (ii) temporal dependency can be exploited to gain discriminative power against audio adversarial examples and is resistant to adaptive attacks considered in our experiments.
Researcher Affiliation	Collaboration	Zhuolin Yang Shanghai Jiao Tong University Bo Li University of Illinois at Urbana Champaign Pin-Yu Chen IBM Research Dawn Song UC, Berkeley
Pseudocode	No	The paper describes its methods verbally and with figures, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code	No	The paper does not provide any statements about releasing code for its methodology, nor does it include links to a source code repository.
Open Datasets	Yes	Libri Speech dataset: Libri Speech (Panayotov et al., 2015) is a corpus of approximately 1000 hours of 16Khz English speech derived from audiobooks from the Libri Vox project. [...] Mozilla Common Voice dataset: Common Voice is a large audio dataset provided by Mozilla. This dataset is public and contains samples from human speaking audio ﬁles. We used the 16Khzsampled data released in (Carlini & Wagner, 2018), whose average duration is 3.998s. [...] Speech Commands dataset: Speech Commands dataset (Warden, 2018) is an audio dataset contains 65000 audio ﬁles.
Dataset Splits	No	The paper mentions using specific test subsets of datasets (e.g., 'test-clean dataset' from Libri Speech, 'ﬁrst 100 samples from its test dataset' from Common Voice) and indicates models like Deep Speech are used, but it does not provide explicit training, validation, and test splits (percentages or counts) for their experimental setup or for how the models were trained.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing specifications.
Software Dependencies	No	The paper mentions software like 'Kaldi speech recognition platform' and 'Deep Speech speech-to-text transcription network' but does not provide specific version numbers for these or any other software dependencies, libraries, or programming languages used.
Experiment Setup	Yes	We implemented their attack with 500 iterations and limit the magnitude of adversarial perturbation within 5 and generated 50 adversarial examples per attack task (more targets are shown in the supplementary material).