Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection
Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Chu Yuan Zhang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments |
| Researcher Affiliation | Academia | 1State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2School of Computer and Information Technology, University of Beijing Jiaotong, Beijing, China 3Department of Automation, Tsinghua University, Beijing, China 4University of Science and Technology of China, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Regularized Adaptive Weight Modification |
| Open Source Code | No | The code of our method has been released in Regularized Adaptive Weight Modification. |
| Open Datasets | Yes | We conduct our experiments on four fake audio datasets, including the ASVspoof2019LA (S), ASVspoof2015 (T1), VCC2020 (T2), and In-the-Wild (T3). |
| Dataset Splits | Yes | We divide the genuine and fake audios of the VCC2020 dataset into four subsets. A quarter is used to build the evaluation set, a quarter to build the development set, and the rest to be used as the training set. The In-the-Wild dataset is divided in the same way as the VCC2020. |
| Hardware Specification | No | The paper describes the model architecture and training details, but does not provide specific hardware specifications such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | We use the pre-trained model Wav2vec 2.0 (Baevski et al., 2020) as the feature extractor and the self-attention convolutional neural network (S-CNN) as the classifier. The parameters of Wav2vec 2.0 is loaded from the pre-train model XLSR-53 (Conneau et al., 2020). The parameters are trained by the Adam optimizer. |
| Experiment Setup | Yes | We fine-tune the model weights including the pre-trained model XLSR-53 and the classifier S-CNN. All of the parameters are trained by the Adam optimizer with a batch size of 2 and a learning rate γ of 0.0001. The constant m and Treg in RAWM are set to 0.1 and 2, respectively. The α is initialized to 0.00001 for convolution layers, 0.0001 for the self-attention layer, and 0.1 for full connection layers. |