Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Distortion-Free Watermark for Autoregressive Audio Generation Models
Authors: Yihan Wu, Georgios Milis, Ruibo Chen, Heng Huang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our comprehensive testing on prevalent audio generation platforms demonstrates that ALIGNED-IS not only preserves the quality of generated audio but also significantly improves the watermark detectability compared to the state-of-the-art distortion-free watermarking adaptations, establishing a new benchmark in secure audio technology applications. We evaluate the performance of our methods against various statistical watermarking baselines, including two biased watermarking approaches, KGW (Kirchenbauer et al., 2023) and Unigram (Zhao et al., 2023), as well as three unbiased watermarking algorithms, γ-reweight (Hu et al., 2023), Di Pmark (Wu et al., 2023b), and ITS-edit (Kuditipudi et al., 2023). |
| Researcher Affiliation | Academia | Yihan Wu , Georgios Milis , Ruibo Chen , Heng Huang Department of Computer Science University of Maryland, College Park EMAIL |
| Pseudocode | Yes | Algorithm 1 ALIGNED-IS generator. ... Algorithm 2 ALIGNED-IS detector. ... Algorithm 3 Aligned inverse sampling. |
| Open Source Code | Yes | We release the code in https://github.com/g-milis/Aligned IS. |
| Open Datasets | Yes | For text prompting, we follow Kirchenbauer et al. (2023); Hu et al. (2023) and include three MMW datasets (Piet et al., 2023), Dolly CW (Conover et al., 2023), and two tasks from Water Bench (Tu et al., 2023). For speech prompting, we use the validation set of Libri Speech (Panayotov et al., 2015). |
| Dataset Splits | Yes | We use the validation set of Libri Speech (Panayotov et al., 2015). We generate 500 examples for each task. |
| Hardware Specification | Yes | All experiments are conducted on a NVIDIA A6000 GPU. |
| Software Dependencies | No | The paper references specific models (e.g., Spirit LM, Speech GPT, wav2vec, Hu BERT), and also tools like "k-means algorithm", "DCCRN (Hu et al., 2020)", "NISQA", "DNSMOSPro". However, it doesn't give version numbers for general software dependencies like Python, PyTorch, or specific libraries used for the implementation. |
| Experiment Setup | Yes | We select α {0.3, 0.4} for Di Pmark, and δ {1.0, 1.5, 2.0} and γ = 0.5 for KGW watermark (Kirchenbauer et al., 2023), δ {1.0, 1.5, 2.0} for Unigram (Zhao et al., 2023). For ALIGNED-IS, we partition the token-embedding space into 20 clusters using the k-means algorithm, then perform linear sum assignment to ensure that the resulting centroids are sufficiently separated to accommodate potential retokenization errors. We justify the choice of h = 20 in Appendix E.1. |