Adversarial Music: Real world Audio Adversary against Wake-word Detection System

Authors: Juncheng Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that we can effectively reduce the recognition F1 score of our emulated model from 93.4% to 11.0%. Finally, we tested our audio adversary over the air, and verified it works effectively against Alexa, reducing its F1 score from 92.5% to 11.0%.
Researcher Affiliation Collaboration 1Carnegie Mellon University, 2Bosch Center for Artificial Intelligence, 3Stanford University
Pseudocode Yes Algorithm 1: Karplus-Strong algorithm
Open Source Code Yes 1Our code and demo videos can be accessed at https://www.junchengbillyli.com/ Adversarial Music;
Open Datasets Yes The training data for wake-word detection systems is very limited, so our model is first pre-trained with several large corpora [Cieri et al., 2004, Godfrey et al., 1992, Rousseau et al., 2012] to train a general acoustic model. ... We used LJ speech dataset [Ito, 2017] for background noise and negative speech examples (speak anything but "Alexa").
Dataset Splits No The paper mentions a 'reserved test set' and a 'collected database... used to validate our emulated model' but does not provide specific percentages or counts for training, validation, and test splits.
Hardware Specification No The paper mentions 'Mac Book Pro (15-inch, 2018) speaker', 'Alienware MX 18R2 speakers', and 'Logitech Z506 speakers' for playing the adversarial music, but does not specify the computational hardware (e.g., GPU/CPU models, memory) used for training or running the models for the experiments.
Software Dependencies No The paper mentions implementing the perturbation model 'with the Py Torch library', but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes When using PGD to train, we restricted the frequency to 27.5Hz 4186Hz in the 88 notes space, and restricted the volume from 0 d BA 100 d BA. Other parameters are defined in the code, we fixed some parameters to speed up the training. ... We simulated our testing environments using the parameters shown in Table 2 to transform t(x) our digital adversary to compensate for the room impulse responses.