reproducibilityindex.ai

Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis

Authors: Zhoulin Ji, Chenhao Lin, Hang Wang, Chao Shen

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model achieves an average m AP of 83.55% and an EER of 5.25% at the utterance level. At the segment level, it attains an EER of 1.07% and a 92.19% F1 score. These results highlight the model s robust capability for a comprehensive analysis of synthetic speech, offering a promising avenue for future research and practical applications in this field.
Researcher Affiliation	Academia	Zhoulin Ji1 , Chenhao Lin 2 , Hang Wang3,4 and Chao Shen2 1School of Software Engineering, Xi an Jiaotong University 2School of Cyber Science and Engineering, Xi an Jiaotong University 3School of Automation Science and Engineering, Xi an Jiaotong University 4Department of Computing, The Hong Kong Polytechnic University
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The dataset and code are available at https://github.com/ring-zl/Speech-Forensics.
Open Datasets	Yes	We utilized the LJ Speech dataset, a public domain collection of 13,100 audio clips with transcripts, read by a single speaker. https://keithito.com/LJ-Speech-Dataset/
Dataset Splits	No	The paper mentions 'mini-batch training' and 'training included a warmup phase' but does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard split references).
Hardware Specification	Yes	Our experiments were conducted on a NVIDIA 4090 GPU.
Software Dependencies	No	The paper mentions the use of 'Adam W optimizer' but does not specify software dependencies with version numbers for reproducibility (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	We adopted mini-batch training using the Adam W [Loshchilov and Hutter, 2017] optimizer. The training included a warmup phase of 5 epochs, followed by a cosine decay schedule for the learning rate. The initial learning rate was set at 1e 3, with a weight decay of 1e 3.