Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
Authors: Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results substantiate the effectiveness of our meticulously crafted approach, showcasing a substantial reduction in the likelihood of speaker confusion. Experiments Datasets and Implementation Details The TSE model is trained and evaluated using the widely-used two-speaker mixed dataset WSJ0-2mix (Hershey et al. 2016) and its derivative dataset WSJ0-2mix-extr (Xu et al. 2020). |
| Researcher Affiliation | Collaboration | Zhaoxi Mu1, Xinyu Yang1, Sining Sun2, Qing Yang2 1Xi an Jiaotong University 2Du Xiaoman EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: SDR-TSE Optimization Require: The training data D containing mixed-targetreference speech triplets (y, u, x). 1: Initialize the entire system randomly. 2: while not converged do 3: Sample {(yi, ui, xi)}N i=1 from D . 4: Forward-Propagation 5: Reconstruct the spectrogram { ˆXi}N i=1 of {xi}N i=1 and predict the target speech {ˆui}N i=1. 6: Back-Propagation 7: Update θV by maximizing LLL. 8: Update θEg, θEc, θD, θG and θF by minimizing LKL, LREC, Iv CLUB and LSI-SNR. 9: Update θG and θEg by minimizing LSIM. 10: end while |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | The TSE model is trained and evaluated using the widely-used two-speaker mixed dataset WSJ0-2mix (Hershey et al. 2016) and its derivative dataset WSJ0-2mix-extr (Xu et al. 2020). |
| Dataset Splits | No | The paper states that the model is "trained and evaluated" on WSJ0-2mix and WSJ0-2mix-extr datasets but does not explicitly describe training, validation, and test dataset splits with percentages, counts, or explicit labels like "validation set". |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or versions (e.g., programming language versions, library versions, or specific solver versions) used in the experiments. |
| Experiment Setup | Yes | The dimensions dg, dc, ds, and H are all set to 256. The variational approximation network V is implemented using two four-layer fully connected networks to predict the mean and variance of the posterior distribution, respectively. The model encompasses a total of 45M parameters. The weights of LSI-SNR, LREC, LKL, Iv CLUB, LLL and LSIM are set to 1, 10 3, 10 4, 10 4, 10 3 and 10 3, respectively, determined through a grid search. and L, O, and η are set to 250 ms, 125 ms, and 5% of the maximum energy, respectively. |