Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Int*-Match: Balancing Intra-Class Compactness and Inter-Class Discrepancy for Semi-Supervised Speaker Recognition

Authors: Xingmei Wang, Jinghan Liu, Jiaxiang Meng, Boquan Li, Zijian Liu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our systematic experiments demonstrate the superiority of Int*-Match, presenting an outstanding Equal Error Rate (EER) of 1.00% on the Vox Celeb1 original test set, which is merely 0.06% below the performance achieved by fully supervised learning.
Researcher Affiliation	Academia	Xingmei Wang, Jinghan Liu, Jiaxiang Meng*, Boquan Li, Zijian Liu College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China EMAIL
Pseudocode	No	The paper describes the methodology using mathematical equations and textual descriptions in Section 3, 'Methodology', but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/Liu Jinghan2001/Int Match
Open Datasets	Yes	Dataset. We use the most typical Vox Celeb2 (Chung, Nagrani, and Zisserman 2018) for training... On the other hand, we use the Original, Extended, and Hard Vox Celeb1 test sets (Nagrani, Chung, and Zisserman 2017; Nagrani et al. 2020) for evaluation. ... What s more, we use MUSAN (Snyder, Chen, and Povey 2015) and RIR (Ko et al. 2017) datasets for data augmentation.
Dataset Splits	Yes	We follow the settings to threshold-based SSL methods (Zhang et al. 2021), selecting 2, 4, 10, and 20 utterances per class as labeled data, with the remaining data used as unlabeled data. It is worth noting that choosing 20 utterances per class represents 11% of the training dataset. Furthermore, additional experiments are conducted in Table 2 by selecting 20%, 30%, 40%, and 50% of utterances from each class proportionally, enabling comparisons with fully supervised learning.
Hardware Specification	No	The paper discusses the model architecture (ECAPATDNN) and input features, but does not provide specific hardware details such as GPU models, CPU types, or memory used for the experiments.
Software Dependencies	No	The paper mentions using the Adam optimizer and AAM-softmax loss, but does not provide specific version numbers for any programming languages, libraries, or frameworks used in the implementation.
Experiment Setup	Yes	The labeled batch size is set to 150, and the unlabeled batch size is the same, with a total training step of 560k. The network parameters are optimized by Adam optimizer (Kingma and Ba 2015), where the initial learning rate is set to 0.001, which decreases 3% in every 7k iterations, roughly one epoch of unlabeled data. We use AAM-softmax loss as the loss function, with the margin as 0.2 and the scale as 30. ... For Int*-Match, we set m and τ to 0.999 and 0.65, respectivly.