Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Chu Yuan Zhang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments
Researcher Affiliation Academia 1State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of Computer and Information Technology, University of Beijing Jiaotong, Beijing, China 3Department of Automation, Tsinghua University, Beijing, China 4University of Science and Technology of China, Beijing, China.
Pseudocode Yes Algorithm 1 Regularized Adaptive Weight Modification
Open Source Code No The code of our method has been released in Regularized Adaptive Weight Modification.
Open Datasets Yes We conduct our experiments on four fake audio datasets, including the ASVspoof2019LA (S), ASVspoof2015 (T1), VCC2020 (T2), and In-the-Wild (T3).
Dataset Splits Yes We divide the genuine and fake audios of the VCC2020 dataset into four subsets. A quarter is used to build the evaluation set, a quarter to build the development set, and the rest to be used as the training set. The In-the-Wild dataset is divided in the same way as the VCC2020.
Hardware Specification No The paper describes the model architecture and training details, but does not provide specific hardware specifications such as GPU or CPU models used for the experiments.
Software Dependencies No We use the pre-trained model Wav2vec 2.0 (Baevski et al., 2020) as the feature extractor and the self-attention convolutional neural network (S-CNN) as the classifier. The parameters of Wav2vec 2.0 is loaded from the pre-train model XLSR-53 (Conneau et al., 2020). The parameters are trained by the Adam optimizer.
Experiment Setup Yes We fine-tune the model weights including the pre-trained model XLSR-53 and the classifier S-CNN. All of the parameters are trained by the Adam optimizer with a batch size of 2 and a learning rate γ of 0.0001. The constant m and Treg in RAWM are set to 0.1 and 2, respectively. The α is initialized to 0.00001 for convolution layers, 0.0001 for the self-attention layer, and 0.1 for full connection layers.