Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Distortion-Free Watermark for Autoregressive Audio Generation Models

Authors: Yihan Wu, Georgios Milis, Ruibo Chen, Heng Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our comprehensive testing on prevalent audio generation platforms demonstrates that ALIGNED-IS not only preserves the quality of generated audio but also significantly improves the watermark detectability compared to the state-of-the-art distortion-free watermarking adaptations, establishing a new benchmark in secure audio technology applications. We evaluate the performance of our methods against various statistical watermarking baselines, including two biased watermarking approaches, KGW (Kirchenbauer et al., 2023) and Unigram (Zhao et al., 2023), as well as three unbiased watermarking algorithms, γ-reweight (Hu et al., 2023), Di Pmark (Wu et al., 2023b), and ITS-edit (Kuditipudi et al., 2023).
Researcher Affiliation Academia Yihan Wu , Georgios Milis , Ruibo Chen , Heng Huang Department of Computer Science University of Maryland, College Park EMAIL
Pseudocode Yes Algorithm 1 ALIGNED-IS generator. ... Algorithm 2 ALIGNED-IS detector. ... Algorithm 3 Aligned inverse sampling.
Open Source Code Yes We release the code in https://github.com/g-milis/Aligned IS.
Open Datasets Yes For text prompting, we follow Kirchenbauer et al. (2023); Hu et al. (2023) and include three MMW datasets (Piet et al., 2023), Dolly CW (Conover et al., 2023), and two tasks from Water Bench (Tu et al., 2023). For speech prompting, we use the validation set of Libri Speech (Panayotov et al., 2015).
Dataset Splits Yes We use the validation set of Libri Speech (Panayotov et al., 2015). We generate 500 examples for each task.
Hardware Specification Yes All experiments are conducted on a NVIDIA A6000 GPU.
Software Dependencies No The paper references specific models (e.g., Spirit LM, Speech GPT, wav2vec, Hu BERT), and also tools like "k-means algorithm", "DCCRN (Hu et al., 2020)", "NISQA", "DNSMOSPro". However, it doesn't give version numbers for general software dependencies like Python, PyTorch, or specific libraries used for the implementation.
Experiment Setup Yes We select α {0.3, 0.4} for Di Pmark, and δ {1.0, 1.5, 2.0} and γ = 0.5 for KGW watermark (Kirchenbauer et al., 2023), δ {1.0, 1.5, 2.0} for Unigram (Zhao et al., 2023). For ALIGNED-IS, we partition the token-embedding space into 20 clusters using the k-means algorithm, then perform linear sum assignment to ensure that the resulting centroids are sufficiently separated to accommodate potential retokenization errors. We justify the choice of h = 20 in Appendix E.1.