DeAR: A Deep-Learning-Based Audio Re-recording Resilient Watermarking
Authors: Chang Liu, Jie Zhang, Han Fang, Zehua Ma, Weiming Zhang, Nenghai Yu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed algorithm can resist not only common electronic channel distortions but also AR distortions. Under the premise of high-quality embedding (SNR=25.86 d B), in the case of a common re-recording distance (20 cm), the algorithm can effectively achieve an average bit recovery accuracy of 98.55%. Experimental results verify that the proposed method can achieve satisfying robustness against audio re-recording at different distances and is also resilient to other common distortions. Extensive experiments demonstrate that the proposed method can achieve robustness against audio rerecording and common electronic channel distortions while guaranteeing the requirement of fidelity. Experiments Experiment Settings Dataset. We conduct our experiments on FMA (Defferrard et al. 2017), a famous music analysis dataset in which 12000 audios are utilized for the training of the proposed De AR, and 200 randomly selected audios are adopted as testing audios. |
| Researcher Affiliation | Academia | Chang Liu1, Jie Zhang1, Han Fang 3, Zehua Ma1, Weiming Zhang 1, Nenghai Yu1 1University of Science and Technology of China, 2University of Waterloo, 3National University of Singapore |
| Pseudocode | No | The paper describes the method in prose and mathematical equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | We conduct our experiments on FMA (Defferrard et al. 2017), a famous music analysis dataset in which 12000 audios are utilized for the training of the proposed De AR, and 200 randomly selected audios are adopted as testing audios. |
| Dataset Splits | No | The paper specifies training and testing sets ("12000 audios are utilized for the training... and 200 randomly selected audios are adopted as testing audios.") but does not explicitly mention a separate validation set split or its size/proportion. |
| Hardware Specification | No | The paper mentions "one consumer-grade speaker, SENNHEISER Sp10" and "a consumer-grade microphone, ATR2100-USB" for re-recording experiments. However, it does not specify the computational hardware (e.g., GPUs, CPUs, memory) used to run the deep learning experiments (training and evaluation of the model). |
| Software Dependencies | No | The paper mentions using "Adam(Kingma and Ba 2014) with a learning rate of 10 4 for optimization", but it does not list specific software versions for libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python version) required for reproducibility. |
| Experiment Setup | Yes | In the training process of De AR, we set λe = 150, λw = 1 and λd = 0.01, and utilize Adam(Kingma and Ba 2014) with a learning rate of 10 4 for optimization by default. And we empirically set the threshold of the high-pass filtering (HF[ ]) and that of the low-pass filtering (LF[ ]), namely, α and β as 1 k Hz and 4 k Hz for training the model robust to re-recording. In the testing process, the same watermark bit sequence of 100 bits is embedded for all testing audios. |