Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation

Authors: Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung, Simon See, Renjie Wan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Image Sentinel effectively detects unauthorized dataset usage while preserving generation quality for authorized applications.
Researcher Affiliation Collaboration Ziyuan Luo1,2, Yangyi Zhao1, Ka Chun Cheung2, Simon See2, Renjie Wan1 1Department of Computer Science, Hong Kong Baptist University 2NVIDIA AI Technology Center, NVIDIA
Pseudocode No The paper describes algorithms and processes such as the 'sentinel synthesis algorithm S' and detailed methodologies for attribute extraction and key-guided image synthesis, but it does not include a formally labeled pseudocode block or algorithm box.
Open Source Code Yes Code is available at https:// github.com/luo-ziyuan/Image Sentinel.
Open Datasets Yes For the LLa VA Visual Instruct Pretrain (LLa VA-Pretrain) Dataset [51], we use a subset containing 10, 000 images as the reference image database... For the Product-10K dataset [52], we utilize the test split containing 30, 000 product images as the reference image database...
Dataset Splits Yes For the Product-10K dataset [52], we utilize the test split containing 30, 000 product images as the reference image database
Hardware Specification Yes All experiments are conducted on 8 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions specific models and systems used like GPT-4o [47], SDXL [45], CLIP Vi T-B/32 [49], Sig LIP Vi T-B/16 [50], and DINO Vi T-S/16 [48], but does not provide specific software dependencies with version numbers such as Python, PyTorch, or other libraries.
Experiment Setup Yes Unless otherwise specified, we utilize GPT-4o [47] as our proxy vision-language model M and text-to-image model T due to its strong capabilities in attribute extraction, and set the key length to 6 characters. For the RAIG system implementation, we employ three generation modules: SDXL [45] equipped with the Vi T-H IP-adapter [46], Omni Gen [44], and GPT-4o [47]. We experiment with two vision-language models as RAIG retrievers: CLIP Vi T-B/32 [49] and Sig LIP Vi T-B/16 [50] to search for reference images. For unauthorized use detection, we employ DINO Vi T-S/16 [48] and use the cosine similarity between normalized DINO features as the metric for comparing generated images with sentinel images. All experiments are conducted on 8 NVIDIA Tesla V100 GPUs. For the LLa VA Visual Instruct Pretrain (LLa VA-Pretrain) Dataset [51], we use a subset containing 10, 000 images as the reference image database... For the Product-10K dataset [52], we utilize the test split containing 30, 000 product images as the reference image database...