reproducibilityindex.ai

Cross-Talk Reduction

Authors: Zhong-Qiu Wang, Anurag Kumar, Shinji Watanabe

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluation results on a simulated two-speaker CTR task and on a real-recorded conversational speech separation and recognition task show the effectiveness and potential of CTRnet.
Researcher Affiliation	Collaboration	Zhong-Qiu Wang1 , Anurag Kumar2 and Shinji Watanabe3 1Southern University of Science and Technology, China 2Meta Reality Labs Research, USA 3Carnegie Mellon University, USA
Pseudocode	Yes	At run time, to separate the close-talk speech of an entire session, we run CTRNet in a block-wise way, using the pseudo-code below at each processing block:
Open Source Code	No	A sound demo is provided in the link below.2 (Footnote 2: See https://zqwang7.github.io/demos/CTRnet_demo/index.html). This link leads to a sound demo, not the source code for the methodology presented in the paper.
Open Datasets	Yes	SMS-WSJ-FF-CT, with FF meaning far-ﬁeld and CT close-talk, is built upon a simulated dataset named SMSWSJ [Drude et al., 2019]... train and evaluate CTRnet using the real-recorded CHi ME-7 dataset, following the setup of the CHi ME-7 DASR challenge [Cornell et al., 2023].
Dataset Splits	Yes	SMS-WSJ [Drude et al., 2019]... has 33, 561 (≈ 87.4 h), 982 (≈ 2.5 h) and 1, 332 (≈ 3.4 h) 2-speaker mixtures for training, validation and testing. ... CHi ME-7 Dataset... There are 14 (≈ 34 h), 2 (≈ 2 h) and 4 (≈ 5 h) recorded sessions respectively for training, validation and testing.
Hardware Specification	No	Experiments of this work used the Bridges2 system at PSC and Delta at NCSA through allocation CIS210014 and IRI120008P from the Advanced Cyberinfrastructure Coordination Ecosystem: Services and Support (ACCESS) program. While specific systems are named, the paper does not specify the exact GPU models, CPU models, or memory configurations of these resources, which are necessary for hardware specification.
Software Dependencies	No	The paper mentions several software components like "TF-Grid Net [Wang et al., 2023c]", "torchiva toolkit [Scheibler and Saijo, 2022]", and "Wav LM features", but it does not provide specific version numbers for these or other crucial software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For training, in default we sample an L-second segment from each mixture in each epoch, and the batch size is H. ... For STFT, the window size is 16 ms, hop size 8 ms, and the square root of the Hann window is used as the analysis window. TF-Grid Net [Wang et al., 2023c] is employed as the DNN architecture. Using the symbols deﬁned in Table I of [Wang et al., 2023c], we set its hyper-parameters to D = 128, B = 4, I = 1, J = 1, H = 192, L = 4 and E = 4 (please do not confuse these symbols with the ones deﬁned in this paper). The model has around 4.8 million parameters. ξ in (10) and (11) is tuned to 10^-3. β in (15) is set to 1.0. ... The ﬁlter taps I and J are tuned to 19 and 1. ... the processing block size is set to 8 seconds, the same as the segment length used during training. We conﬁgure the blocks to be slightly overlapped, where we consider the ﬁrst and the last 0.96 seconds as context, and output the DNN estimates in the center 6.08 (= 8 − 0.96 − 0.96) seconds at each block.