Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation
Authors: Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung, Simon See, Renjie Wan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that Image Sentinel effectively detects unauthorized dataset usage while preserving generation quality for authorized applications. |
| Researcher Affiliation | Collaboration | Ziyuan Luo1,2, Yangyi Zhao1, Ka Chun Cheung2, Simon See2, Renjie Wan1 1Department of Computer Science, Hong Kong Baptist University 2NVIDIA AI Technology Center, NVIDIA |
| Pseudocode | No | The paper describes algorithms and processes such as the 'sentinel synthesis algorithm S' and detailed methodologies for attribute extraction and key-guided image synthesis, but it does not include a formally labeled pseudocode block or algorithm box. |
| Open Source Code | Yes | Code is available at https:// github.com/luo-ziyuan/Image Sentinel. |
| Open Datasets | Yes | For the LLa VA Visual Instruct Pretrain (LLa VA-Pretrain) Dataset [51], we use a subset containing 10, 000 images as the reference image database... For the Product-10K dataset [52], we utilize the test split containing 30, 000 product images as the reference image database... |
| Dataset Splits | Yes | For the Product-10K dataset [52], we utilize the test split containing 30, 000 product images as the reference image database |
| Hardware Specification | Yes | All experiments are conducted on 8 NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions specific models and systems used like GPT-4o [47], SDXL [45], CLIP Vi T-B/32 [49], Sig LIP Vi T-B/16 [50], and DINO Vi T-S/16 [48], but does not provide specific software dependencies with version numbers such as Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Unless otherwise specified, we utilize GPT-4o [47] as our proxy vision-language model M and text-to-image model T due to its strong capabilities in attribute extraction, and set the key length to 6 characters. For the RAIG system implementation, we employ three generation modules: SDXL [45] equipped with the Vi T-H IP-adapter [46], Omni Gen [44], and GPT-4o [47]. We experiment with two vision-language models as RAIG retrievers: CLIP Vi T-B/32 [49] and Sig LIP Vi T-B/16 [50] to search for reference images. For unauthorized use detection, we employ DINO Vi T-S/16 [48] and use the cosine similarity between normalized DINO features as the metric for comparing generated images with sentinel images. All experiments are conducted on 8 NVIDIA Tesla V100 GPUs. For the LLa VA Visual Instruct Pretrain (LLa VA-Pretrain) Dataset [51], we use a subset containing 10, 000 images as the reference image database... For the Product-10K dataset [52], we utilize the test split containing 30, 000 product images as the reference image database... |