Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Speech Watermarking with Discrete Intermediate Representations
Authors: Shengpeng Ji, Ziyue Jiang, Jialong Zuo, Minghui Fang, Yifu Chen, Tao Jin, Zhou Zhao
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that our framework achieves state-of-the-art performance in robustness and imperceptibility, simultaneously. Moreover, our flexible frame-wise approach can serve as an efficient solution for both voice cloning detection and information hiding. Additionally, Discrete WM can encode 1 to 150 bits of watermark information within a 1-second speech clip, indicating its encoding capacity. |
| Researcher Affiliation | Academia | Zhejiang University EMAIL |
| Pseudocode | No | The paper describes methods using prose and mathematical formulations, but no clearly labeled 'Pseudocode' or 'Algorithm' blocks are present. |
| Open Source Code | No | Demo https://Discrete WM.github.io/discrete wm. This link points to a demonstration page, not explicitly to the source code repository for the methodology described in the paper. |
| Open Datasets | Yes | Datasets. For training, we employ the standard training set of Libri TTS (Zen et al. 2019), which contains approximately 585 hours of English speech at 24k Hz sampling rate. |
| Dataset Splits | Yes | For training, we employ the standard training set of Libri TTS (Zen et al. 2019)... We randomly select 100 text transcriptions and 100 speech prompts from the Libri TTS test-clean set... The test set also includes all of the speech samples from the testclean set of Libri TTS. |
| Hardware Specification | Yes | The RTF (Real-Time Factor) evaluation is conducted with 1 NVIDIA A100 GPU and batch size 1. |
| Software Dependencies | No | The paper mentions software components and techniques like STFT, VQ-VAE, and GANs, but it does not specify any version numbers for libraries or tools used for implementation. |
| Experiment Setup | Yes | For the Short-Time Fourier Transform operation (STFT), we adopt a filter length of 400, a hop length of 80, and a window function applied to each frame with a length of 400... λadv is the hyper-parameter to balance the three terms, which is set to 10 2... We set the watermark ratio m of Discrete WM to 10%. |