reproducibilityindex.ai

Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction

Authors: Zhaoxi Mu, Xinyu Yang, Sining Sun, Qing Yang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results substantiate the effectiveness of our meticulously crafted approach, showcasing a substantial reduction in the likelihood of speaker confusion. Experiments Datasets and Implementation Details The TSE model is trained and evaluated using the widely-used two-speaker mixed dataset WSJ0-2mix (Hershey et al. 2016) and its derivative dataset WSJ0-2mix-extr (Xu et al. 2020).
Researcher Affiliation	Collaboration	Zhaoxi Mu1, Xinyu Yang1, Sining Sun2, Qing Yang2 1Xi an Jiaotong University 2Du Xiaoman wsmzxxh@stu.xjtu.edu.cn, yxyphd@mail.xjtu.edu.cn, {sunsining,yangqing}@duxiaoman.com
Pseudocode	Yes	Algorithm 1: SDR-TSE Optimization Require: The training data D containing mixed-targetreference speech triplets (y, u, x). 1: Initialize the entire system randomly. 2: while not converged do 3: Sample {(yi, ui, xi)}N i=1 from D . 4: Forward-Propagation 5: Reconstruct the spectrogram { ˆXi}N i=1 of {xi}N i=1 and predict the target speech {ˆui}N i=1. 6: Back-Propagation 7: Update θV by maximizing LLL. 8: Update θEg, θEc, θD, θG and θF by minimizing LKL, LREC, Iv CLUB and LSI-SNR. 9: Update θG and θEg by minimizing LSIM. 10: end while
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	The TSE model is trained and evaluated using the widely-used two-speaker mixed dataset WSJ0-2mix (Hershey et al. 2016) and its derivative dataset WSJ0-2mix-extr (Xu et al. 2020).
Dataset Splits	No	The paper states that the model is "trained and evaluated" on WSJ0-2mix and WSJ0-2mix-extr datasets but does not explicitly describe training, validation, and test dataset splits with percentages, counts, or explicit labels like "validation set".
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies or versions (e.g., programming language versions, library versions, or specific solver versions) used in the experiments.
Experiment Setup	Yes	The dimensions dg, dc, ds, and H are all set to 256. The variational approximation network V is implemented using two four-layer fully connected networks to predict the mean and variance of the posterior distribution, respectively. The model encompasses a total of 45M parameters. The weights of LSI-SNR, LREC, LKL, Iv CLUB, LLL and LSIM are set to 1, 10 3, 10 4, 10 4, 10 3 and 10 3, respectively, determined through a grid search. and L, O, and η are set to 250 ms, 125 ms, and 5% of the maximum energy, respectively.