Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Target Sound Extraction via Disentangled Codec Representations with Privileged Knowledge Distillation

Authors: Dail Kim, Joon-Hyuk Chang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that DCKD consistently improves existing methods across model architectures under the multi-target selection condition. Experimental results on the Kaggle2018-TAU dataset demonstrate that DCKD consistently improves separation performance across various TSE architectures under multi-target selection conditions. We constructed a synthetic dataset by combining sound events (SEs) from the Freesound Dataset Kaggle 2018 (FSD Kaggle) [47] and background sounds from the TAU Urban Acoustic Scenes 2019 dataset [48]. We evaluate model performance using signal-to-distortion Ratio (SDR) and Scale-Invariant SDR (SI-SDR) [44] improvements over the mixture signal in decibels (d B) using the BSSeval toolkit [46].
Researcher Affiliation Academia 1Department of Artificial Intelligence 2Department of Electronic Engineering Hanyang University Seoul, Republic of Korea EMAIL
Pseudocode No The paper describes the proposed approach using text, mathematical formulations, and block diagrams (Figure 1 and 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: Guidelines: The answer NA means that paper does not include experiments requiring code. Please see the NeurIPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark).
Open Datasets Yes We constructed a synthetic dataset by combining sound events (SEs) from the Freesound Dataset Kaggle 2018 (FSD Kaggle) [47] and background sounds from the TAU Urban Acoustic Scenes 2019 dataset [48]. To validate the generalization capability of DCKD, we further conducted experiments on the ESC-50 dataset [49]. We used the Audio Cap dataset [52] for training, and the test set sourced from [51].
Dataset Splits Yes For dataset partitioning, the train, validation, and eval splits of the FSD dataset were used for training, validation, and testing, respectively. For validation and testing, 5,000 fixed mixtures were prepared following the same generation procedure, excluding the zero target sound selection case. The dataset was split into 20, 10, and 10 clips per class for training, validation, and testing, respectively.
Hardware Specification Yes All experiments were conducted on a server equipped with four NVIDIA Ge Force RTX 4090 GPUs. Table 6: Model size and training time comparison of teacher and student models. GPU Setup: 4 RTX 4090
Software Dependencies No The paper mentions using the 'Adam optimizer' and the 'Parselmouth package [50]', but it does not specify version numbers for these or for any general programming languages or deep learning frameworks used for implementation (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We employed the Adam optimizer with an initial learning rate of 5e-4. A learning rate scheduler reduced the learning rate by a factor of 0.9 whenever the validation loss failed to improve for three consecutive epochs. The teacher and student models were trained for 500 and 300 epochs, respectively, with early stopping when the models stopped improving for 30 epochs. During model optimization, the weighting parameter of teacher loss function, λCLUB, λNCE and λcos were fixed at 1e-5, 1.0, and 10.0 and for student loss function, λfe and λre were set to 1e-4 and 0.6 to balance the values of losses.