Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Int*-Match: Balancing Intra-Class Compactness and Inter-Class Discrepancy for Semi-Supervised Speaker Recognition
Authors: Xingmei Wang, Jinghan Liu, Jiaxiang Meng, Boquan Li, Zijian Liu
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our systematic experiments demonstrate the superiority of Int*-Match, presenting an outstanding Equal Error Rate (EER) of 1.00% on the Vox Celeb1 original test set, which is merely 0.06% below the performance achieved by fully supervised learning. |
| Researcher Affiliation | Academia | Xingmei Wang, Jinghan Liu, Jiaxiang Meng*, Boquan Li, Zijian Liu College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China EMAIL |
| Pseudocode | No | The paper describes the methodology using mathematical equations and textual descriptions in Section 3, 'Methodology', but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Liu Jinghan2001/Int Match |
| Open Datasets | Yes | Dataset. We use the most typical Vox Celeb2 (Chung, Nagrani, and Zisserman 2018) for training... On the other hand, we use the Original, Extended, and Hard Vox Celeb1 test sets (Nagrani, Chung, and Zisserman 2017; Nagrani et al. 2020) for evaluation. ... What s more, we use MUSAN (Snyder, Chen, and Povey 2015) and RIR (Ko et al. 2017) datasets for data augmentation. |
| Dataset Splits | Yes | We follow the settings to threshold-based SSL methods (Zhang et al. 2021), selecting 2, 4, 10, and 20 utterances per class as labeled data, with the remaining data used as unlabeled data. It is worth noting that choosing 20 utterances per class represents 11% of the training dataset. Furthermore, additional experiments are conducted in Table 2 by selecting 20%, 30%, 40%, and 50% of utterances from each class proportionally, enabling comparisons with fully supervised learning. |
| Hardware Specification | No | The paper discusses the model architecture (ECAPATDNN) and input features, but does not provide specific hardware details such as GPU models, CPU types, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and AAM-softmax loss, but does not provide specific version numbers for any programming languages, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | The labeled batch size is set to 150, and the unlabeled batch size is the same, with a total training step of 560k. The network parameters are optimized by Adam optimizer (Kingma and Ba 2015), where the initial learning rate is set to 0.001, which decreases 3% in every 7k iterations, roughly one epoch of unlabeled data. We use AAM-softmax loss as the loss function, with the margin as 0.2 and the scale as 30. ... For Int*-Match, we set m and τ to 0.999 and 0.65, respectivly. |