Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Distortion-Free Watermark for Autoregressive Audio Generation Models

Authors: Yihan Wu, Georgios Milis, Ruibo Chen, Heng Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our comprehensive testing on prevalent audio generation platforms demonstrates that ALIGNED-IS not only preserves the quality of generated audio but also significantly improves the watermark detectability compared to the state-of-the-art distortion-free watermarking adaptations, establishing a new benchmark in secure audio technology applications. We evaluate the performance of our methods against various statistical watermarking baselines, including two biased watermarking approaches, KGW (Kirchenbauer et al., 2023) and Unigram (Zhao et al., 2023), as well as three unbiased watermarking algorithms, γ-reweight (Hu et al., 2023), Di Pmark (Wu et al., 2023b), and ITS-edit (Kuditipudi et al., 2023).
Researcher Affiliation	Academia	Yihan Wu , Georgios Milis , Ruibo Chen , Heng Huang Department of Computer Science University of Maryland, College Park EMAIL
Pseudocode	Yes	Algorithm 1 ALIGNED-IS generator. ... Algorithm 2 ALIGNED-IS detector. ... Algorithm 3 Aligned inverse sampling.
Open Source Code	Yes	We release the code in https://github.com/g-milis/Aligned IS.
Open Datasets	Yes	For text prompting, we follow Kirchenbauer et al. (2023); Hu et al. (2023) and include three MMW datasets (Piet et al., 2023), Dolly CW (Conover et al., 2023), and two tasks from Water Bench (Tu et al., 2023). For speech prompting, we use the validation set of Libri Speech (Panayotov et al., 2015).
Dataset Splits	Yes	We use the validation set of Libri Speech (Panayotov et al., 2015). We generate 500 examples for each task.
Hardware Specification	Yes	All experiments are conducted on a NVIDIA A6000 GPU.
Software Dependencies	No	The paper references specific models (e.g., Spirit LM, Speech GPT, wav2vec, Hu BERT), and also tools like "k-means algorithm", "DCCRN (Hu et al., 2020)", "NISQA", "DNSMOSPro". However, it doesn't give version numbers for general software dependencies like Python, PyTorch, or specific libraries used for the implementation.
Experiment Setup	Yes	We select α {0.3, 0.4} for Di Pmark, and δ {1.0, 1.5, 2.0} and γ = 0.5 for KGW watermark (Kirchenbauer et al., 2023), δ {1.0, 1.5, 2.0} for Unigram (Zhao et al., 2023). For ALIGNED-IS, we partition the token-embedding space into 20 clusters using the k-means algorithm, then perform linear sum assignment to ensure that the resulting centroids are sufficiently separated to accommodate potential retokenization errors. We justify the choice of h = 20 in Appendix E.1.