Learning Efficient Representations for Fake Speech Detection

Authors: Nishant Subramani, Delip Rao5859-5866

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present four parameter-efficient convolutional architectures for fake speech detection with best detection F1 scores of around 97 points on a large dataset of fake and bonafide speech. We show how the fake speech detection task naturally lends itself to a novel multi-task problem further improving F1 scores for a mere 0.5% increase in model parameters.
Researcher Affiliation Industry Nishant Subramani, Delip Rao AI Foundation San Francisco, California {nishant, delip}@aifoundation.com
Pseudocode No The paper describes model architectures using block diagrams and textual descriptions (e.g., "Efficient CNN A model consisting of an input processing block, 4 convolution blocks, and a classification block."), but does not provide formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing its own source code, nor does it provide a link to a code repository for the described methodology.
Open Datasets Yes We use datasets from Todisco et al. (2019), originally created for the ASVSpoof 2019 challenge. We select 226 speakers from the Mozilla Common Voice project on the English side which have 3 or more contributed utterances and have demographic information6. To construct the training set, we randomly choose 17 sentences from the English side of the IWSLT167 English to German translation training set.
Dataset Splits Yes Table 1: Summary of ASVSpoof2019 data used here Training Validation #samples 25,380 5,438. Our 5S recipe produces a set of 2624 BONAFIDE and 3842 FAKE utterances for training as well as 660 BONAFIDE and 1001 FAKE examples for validation.
Hardware Specification Yes For our experiments, we use a single NVIDIA V100 GPU unless otherwise specified.
Software Dependencies No The paper mentions using an "Adam optimizer with default parameters" and references a paper (Kingma and Ba, 2014), but it does not specify any software libraries (e.g., PyTorch, TensorFlow, scikit-learn) or their version numbers.
Experiment Setup Yes For optimization, we use mini-batch SGD with a batch size of 128 and an Adam optimizer with default parameters (α = 10 3, β = [0.9, 0.999]) (Kingma and Ba, 2014). We halve the learning rate, α, during training whenever there is no improvement in validation set loss: completely stopping when the learning rate drops below 10 5.