Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ObCLIP: Oblivious CLoud-Device Hybrid Image Generation with Privacy Preservation

Authors: Haoqi Wu, Wei Dai, Ming Xu, Wang Li, Qiang Yan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across multiple datasets demonstrate that Ob CLIP provides rigorous privacy and comparable utility to cloud models with slightly increased server cost. We conduct extensive text-to-image generation experiments on several stable diffusion models across three datasets.
Researcher Affiliation Collaboration Haoqi Wu1, , Wei Dai1, Ming Xu2, Li Wang1, Qiang Yan1 1Tik Tok Inc., 2National University of Singapore
Pseudocode Yes Algorithm 1: Oblivious Hybrid Generation Algorithm 2: Batch-reused Attention Module
Open Source Code No We do not have the time to refactor the code, which is of poor readability. We promise to open-source the code to reproduce the experimental results on Git Hub once accepted.
Open Datasets Yes To evaluate the performance of Ob CLIP, we adopt two commonly-used datasets: 1) MS-COCO 2014 dataset [23] with a resolution of 512 512. We use 30k prompts from its validation split. 2) MJHQ [19] with a resolution of 1024 1024. For more comprehensive evaluation on oblivious generation, we construct a candidate prompt dataset using 10 templates, like Highquality, face portrait photo of a <age> <ethnicity> <gender> with random fill on these sensitive attributes. The detailed construction is provided in Appendix B.3.
Dataset Splits Yes To evaluate the performance of Ob CLIP, we adopt two commonly-used datasets: 1) MS-COCO 2014 dataset [23] with a resolution of 512 512. We use 30k prompts from its validation split.
Hardware Specification Yes All the experiments are conducted on one Ubuntu machine equipped with one Intel Xeon Platinum 8260 CPU, 16GB of RAM and 1 NVIDIA Tesla-V100-SXM2-32GB GPU.
Software Dependencies No The paper mentions using several models (e.g., SD-v1.4, SDXL, Distil BERT) and a DPM-Solver, but does not specify versions for programming languages, libraries (like PyTorch or TensorFlow), or other software dependencies.
Experiment Setup Yes Models. We consider several combinations for hybrid generation. We consider SD-v1.4 [35], and its compressed versions BK-SDM-small and BK-SDM-tiny [17]... We opt for 25-step DPM scheduler (8-step for LCM-SDXL) for all evaluated works. For Ob CLIP, we mainly adopt two acceleration configurations: 1) switch point k = 5, cache point r = 3 and skip point s = 3; 2) k = 10, r = 4 and s = 6... Regarding image quality, we follow prior works to evaluate the visual quality using Frechet Inception Distance (FID) [12] and Inception Score (IS) [37]. We assess text-image alignment using CLIP score [11]... The FLOPs and latency are measured for a batch size of 4. We run each experiment ten times and report the average results.