FreqMark: Invisible Image Watermarking via Frequency Based Optimization in Latent Space

Authors: YiYang Guo, Ruizhe Li, Mude Hui, Hanzhong Guo, Chen Zhang, Chuangjian Cai, Le Wan, shangfei wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Freq Mark offers significant advantages in image quality and robustness, permits flexible selection of the encoding bit number, and achieves a bit accuracy exceeding 90% when encoding a 48-bit hidden message under various attack scenarios.
Researcher Affiliation Collaboration Yiyang Guo 1,5 , Ruizhe Li 2, Mude Hui3, Hanzhong Guo4, Chen Zhang1 Chuangjian Cai5, Le Wan5, Shangfei Wang 1 1University of Science and Technology of China 2Fudan University 3University of California, Santa Cruz 4The University of Hong Kong 5IEG, Tencent
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code No We will declutter and release the code in the future. We have provided sufficient details for the replication of the paper in Appendix A.1.
Open Datasets Yes Datasets A test dataset is compiled, consisting of 500 images randomly selected from the Image Net validation set [16], in conjunction with 500 images generated using Stable Diffusion [42] based on prompts from the Diffusion DB [49] dataset.
Dataset Splits No A test dataset is compiled, consisting of 500 images randomly selected from the Image Net validation set [16], in conjunction with 500 images generated using Stable Diffusion [42] based on prompts from the Diffusion DB [49] dataset.
Hardware Specification Yes Compute Resources All experiments could be conducted on a single A-100 GPU with 40GB memory.
Software Dependencies No The paper mentions specific models like 'Stable Diffusion 2-1' and 'DINO v2' and 'Adam optimizer' but does not provide specific version numbers for general software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Hyperparameters The KL auto-encoder from Stable Diffusion 2-1 [42] is utilized. Due to the significant reconstruction loss associated with low-resolution images, the images are upscaled to 512 512 for processing. In the watermark addition stage, the Adam optimizer is used with a learning rate of 2.0 and training for 400 steps. We set the PSNR loss weight λp to 0.05 and the LPIPS loss weight λi to 0.25. To encode the watermark, the first 128 dimensions of the output feature generated by the Dino v2 small image encoder [37] are utilized. In the experiments, we set the directional vectors as a set of 48 vectors, where the i-th vector has a value of 1 in its i-th dimension and 0 for the remaining dimensions. In addition, during the training phase, two types of spatial transformations and pixel noise are selected with equal probability. For rotation augmentation, the rotation angle is randomly chosen in 90-degree increments. The crop augmentation is set with a crop scale range of [0.2, 1.0] and a crop ratio range of [3/4, 4/3].