Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Neural Vocoder from Range-Null Space Decomposition
Authors: Andong Li, Tong Lei, Zhihang Sun, Rilin Chen, Erwei Yin, Xiaodong Li, Chengshi Zheng
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments are conducted on the LJSpeech and Libri TTS benchmarks. Quantitative and qualitative results show that while enjoying lightweight network parameters, the proposed approach yields state-of-the-art performance among existing advanced methods. |
| Researcher Affiliation | Collaboration | 1Institute of Acoustics, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Tencent AI Lab 4Nanjing University 5 Defense Innovation Institute, Academy of Military Sciences (AMS) 6 Tianjin Artificial Intelligence Innovation Center (TAIIC) EMAIL |
| Pseudocode | No | The paper describes network architectures and processes in detail (e.g., in Section 3.3 and Figure 3) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and the pretrained model weights are available at https://github.com/Andong-Li-speech/RNDVoC. |
| Open Datasets | Yes | Two benchmarks are employed in this study, namely LJSpeech [Keith and Linda, 2017] and Libri TTS [Zen et al., 2019]. |
| Dataset Splits | Yes | The LJSpeech dataset includes 13,100 clean speech clips by a single female, and the sampling rate is 22.05 k Hz. Following the division in the open-sourced VITS repository4, {12500, 100, 500} clips are used for training, valiation, and testing, respectively. The Libri TTS dataset covers diverse recording environments with the sampling rate of 24 k Hz. Following the division in [Lee et al., 2023], {train-clean-100, train-clean-300, train-other-500} are for model training. The subsets dev-clean + dev-other are for objective comparisons, and test-clean + test-other are for subjective evaluations. |
| Hardware Specification | Yes | The inference speed on a CPU is evaluated based on a CPU Intel(R) Core(TM) i7-14700F. For GPU, it is based on NVIDIA Ge Force RTX 4060 Ti. |
| Software Dependencies | No | The paper mentions the use of the Adam W optimizer but does not specify version numbers for any key software components or libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | A batch size of 16, a segment size of 16384, and an initial learning rate of 2e-4 are used for training. The Adam W optimizer [Loshchilov and Hutter, 2017] is employed, with {β1 = 0.8, β2 = 0.99}. The generator and discriminator are updated for 1 million steps, respectively. |