reproducibilityindex.ai

Adversarial Music: Real world Audio Adversary against Wake-word Detection System

Authors: Juncheng Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that we can effectively reduce the recognition F1 score of our emulated model from 93.4% to 11.0%. Finally, we tested our audio adversary over the air, and veriﬁed it works effectively against Alexa, reducing its F1 score from 92.5% to 11.0%.
Researcher Affiliation	Collaboration	1Carnegie Mellon University, 2Bosch Center for Artiﬁcial Intelligence, 3Stanford University
Pseudocode	Yes	Algorithm 1: Karplus-Strong algorithm
Open Source Code	Yes	1Our code and demo videos can be accessed at https://www.junchengbillyli.com/ Adversarial Music;
Open Datasets	Yes	The training data for wake-word detection systems is very limited, so our model is ﬁrst pre-trained with several large corpora [Cieri et al., 2004, Godfrey et al., 1992, Rousseau et al., 2012] to train a general acoustic model. ... We used LJ speech dataset [Ito, 2017] for background noise and negative speech examples (speak anything but "Alexa").
Dataset Splits	No	The paper mentions a 'reserved test set' and a 'collected database... used to validate our emulated model' but does not provide specific percentages or counts for training, validation, and test splits.
Hardware Specification	No	The paper mentions 'Mac Book Pro (15-inch, 2018) speaker', 'Alienware MX 18R2 speakers', and 'Logitech Z506 speakers' for playing the adversarial music, but does not specify the computational hardware (e.g., GPU/CPU models, memory) used for training or running the models for the experiments.
Software Dependencies	No	The paper mentions implementing the perturbation model 'with the Py Torch library', but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	When using PGD to train, we restricted the frequency to 27.5Hz 4186Hz in the 88 notes space, and restricted the volume from 0 d BA 100 d BA. Other parameters are deﬁned in the code, we ﬁxed some parameters to speed up the training. ... We simulated our testing environments using the parameters shown in Table 2 to transform t(x) our digital adversary to compensate for the room impulse responses.