Discriminative Feature Decoupling Enhancement for Speech Forgery Detection

Authors: Yijun Bei, Xing Zhou, Erteng Liu, Yang Gao, Sen Lin, Kewei Gao, Zunlei Feng

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental extensive experiments demonstrate that DEEM achieves an accuracy improvement of over 5% on Fo R dataset compared to the state-of-the-art methods.
Researcher Affiliation Collaboration 1School of Software Technology, Zhejiang University 2State Key Laboratory of Blockchain and Security, Zhejiang University 3Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security 4Ningbo Donghai Group Co., Ltd.
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes Specifically, we utilize publicly available datasets, namely Libri Speech ASR [Panayotov et al., 2015] and Nonspeech [Hu and Wang, 2010], to generate synthetic speech samples... Speech Forgery Benchmark Dataset. In the experimental section, this study utilizes two representative speech forgery detection datasets to evaluate the performance of the proposed algorithm. Fo R [Reimao and Tzerpos, 2019]... ASVspoof 2019 LA [Todisco et al., 2019] serves as a dataset specifically designed for ASV anti-spoofing purposes.
Dataset Splits No The paper mentions 'training and development stages' and 'evaluation phase' for ASVspoof2019LA, and 'standard version' for Fo R, but does not provide specific percentages or sample counts for train/validation/test splits, nor does it specify how these splits are performed.
Hardware Specification Yes The proposed DEEM model is implemented in Py Torch and evaluated on an NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with version numbers.
Experiment Setup Yes In the decoupled training phase, a learning rate adjustment strategy is employed, with an initial learning rate set to 0.0001. If the loss value does not exhibit a significant reduction after five consecutive training iterations, the learning rate is decreased. The Adam optimizer is utilized, and the training process is carried out for 150 epochs, employing the mean squared error (MSE) loss function. In the subsequent classification training phase, the same learning rate and optimizer settings are applied. The training is performed for 160 epochs, employing the cross-entropy loss function.