Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
Authors: Luca Della Libera, Francesco Paissan, Cem Subakan, Mirco Ravanelli
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Focal Codec on speech resynthesis, considering both English and multilingual speech. For English speech, we use Libri Speech [47] test-clean. For multilingual speech, following [76], we randomly select 100 utterances from each of the 7 foreign languages in Multilingual Libri Speech [51] (Dutch, French, German, Italian, Polish, Portuguese, and Spanish), resulting in a total of 700 utterances5. We also consider the more realistic scenario of speech contaminated with environmental noise. ... We evaluate the models using objective metrics. ... Results are presented in Table 2. |
| Researcher Affiliation | Academia | Luca Della Libera1,2 Francesco Paissan3,2,4 Cem Subakan5,1,2 Mirco Ravanelli1,2 1Concordia University 2Mila-Quebec AI Institute 3Fondazione Bruno Kessler 4University of Trento 5Université Laval |
| Pseudocode | No | The paper describes the architecture and training process using natural language descriptions, mathematical formulas for focal modulation, and block diagrams (Figure 1), but does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Demo samples and code are available at https://lucadellalib.github.io/focalcodec-web/. |
| Open Datasets | Yes | The following datasets were used in this work: Libri Speech [47] is a large-scale corpus of English read speech derived from audiobooks in the Libri Vox project. ... License: CC BY 4.0. ... Libri TTS [84] is a corpus designed for text-to-speech research... License: CC BY 4.0. ... Multilingual Libri Speech [51] is an extension of Libri Speech... License: CC BY 4.0. |
| Dataset Splits | Yes | Libri TTS [84] ... It consists of 585 hours of transcribed speech with predefined training, validation, and test splits. ... Libri Speech [47] ... with predefined training, validation, and test splits. ... For ASR, we use Libri Speech [47] train-clean-100 and train-clean-360 for training, dev-clean for validation, and test-clean for testing. |
| Hardware Specification | Yes | Each model is trained on a single GPU, with the choice between V100 GPUs (16 or 32 GB) and A100 GPUs (40 GB), depending on cluster resource availability. |
| Software Dependencies | No | Software for the experimental evaluation was implemented in Python using the Speech Brain [55, 54] toolkit. ... We use the Adam W [40] optimizer... The paper mentions Python and the Speech Brain toolkit, but does not provide specific version numbers for these or other critical software dependencies like PyTorch/TensorFlow or CUDA. |
| Experiment Setup | Yes | We use a weight of 1.0 for the reconstruction loss and a weight of 0.1 for the entropy loss. We use the Adam W [40] optimizer with an initial learning rate of 0.0005, β1 of 0.8, β2 of 0.99, and weight decay of 0.01. The learning rate is reduced by a factor of 0.9 if validation loss does not improve within a margin of 0.0025. Gradients are clipped to a maximum L2 norm of 5. ... For the STFT, we set the FFT size to 1024 samples and the hop length to 320. |