Adversarial Music: Real world Audio Adversary against Wake-word Detection System
Authors: Juncheng Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that we can effectively reduce the recognition F1 score of our emulated model from 93.4% to 11.0%. Finally, we tested our audio adversary over the air, and verified it works effectively against Alexa, reducing its F1 score from 92.5% to 11.0%. |
| Researcher Affiliation | Collaboration | 1Carnegie Mellon University, 2Bosch Center for Artificial Intelligence, 3Stanford University |
| Pseudocode | Yes | Algorithm 1: Karplus-Strong algorithm |
| Open Source Code | Yes | 1Our code and demo videos can be accessed at https://www.junchengbillyli.com/ Adversarial Music; |
| Open Datasets | Yes | The training data for wake-word detection systems is very limited, so our model is first pre-trained with several large corpora [Cieri et al., 2004, Godfrey et al., 1992, Rousseau et al., 2012] to train a general acoustic model. ... We used LJ speech dataset [Ito, 2017] for background noise and negative speech examples (speak anything but "Alexa"). |
| Dataset Splits | No | The paper mentions a 'reserved test set' and a 'collected database... used to validate our emulated model' but does not provide specific percentages or counts for training, validation, and test splits. |
| Hardware Specification | No | The paper mentions 'Mac Book Pro (15-inch, 2018) speaker', 'Alienware MX 18R2 speakers', and 'Logitech Z506 speakers' for playing the adversarial music, but does not specify the computational hardware (e.g., GPU/CPU models, memory) used for training or running the models for the experiments. |
| Software Dependencies | No | The paper mentions implementing the perturbation model 'with the Py Torch library', but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | When using PGD to train, we restricted the frequency to 27.5Hz 4186Hz in the 88 notes space, and restricted the volume from 0 d BA 100 d BA. Other parameters are defined in the code, we fixed some parameters to speed up the training. ... We simulated our testing environments using the parameters shown in Table 2 to transform t(x) our digital adversary to compensate for the room impulse responses. |