Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ObCLIP: Oblivious CLoud-Device Hybrid Image Generation with Privacy Preservation

Authors: Haoqi Wu, Wei Dai, Ming Xu, Wang Li, Qiang Yan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple datasets demonstrate that Ob CLIP provides rigorous privacy and comparable utility to cloud models with slightly increased server cost. We conduct extensive text-to-image generation experiments on several stable diffusion models across three datasets.
Researcher Affiliation	Collaboration	Haoqi Wu1, , Wei Dai1, Ming Xu2, Li Wang1, Qiang Yan1 1Tik Tok Inc., 2National University of Singapore
Pseudocode	Yes	Algorithm 1: Oblivious Hybrid Generation Algorithm 2: Batch-reused Attention Module
Open Source Code	No	We do not have the time to refactor the code, which is of poor readability. We promise to open-source the code to reproduce the experimental results on Git Hub once accepted.
Open Datasets	Yes	To evaluate the performance of Ob CLIP, we adopt two commonly-used datasets: 1) MS-COCO 2014 dataset [23] with a resolution of 512 512. We use 30k prompts from its validation split. 2) MJHQ [19] with a resolution of 1024 1024. For more comprehensive evaluation on oblivious generation, we construct a candidate prompt dataset using 10 templates, like Highquality, face portrait photo of a <age> <ethnicity> <gender> with random fill on these sensitive attributes. The detailed construction is provided in Appendix B.3.
Dataset Splits	Yes	To evaluate the performance of Ob CLIP, we adopt two commonly-used datasets: 1) MS-COCO 2014 dataset [23] with a resolution of 512 512. We use 30k prompts from its validation split.
Hardware Specification	Yes	All the experiments are conducted on one Ubuntu machine equipped with one Intel Xeon Platinum 8260 CPU, 16GB of RAM and 1 NVIDIA Tesla-V100-SXM2-32GB GPU.
Software Dependencies	No	The paper mentions using several models (e.g., SD-v1.4, SDXL, Distil BERT) and a DPM-Solver, but does not specify versions for programming languages, libraries (like PyTorch or TensorFlow), or other software dependencies.
Experiment Setup	Yes	Models. We consider several combinations for hybrid generation. We consider SD-v1.4 [35], and its compressed versions BK-SDM-small and BK-SDM-tiny [17]... We opt for 25-step DPM scheduler (8-step for LCM-SDXL) for all evaluated works. For Ob CLIP, we mainly adopt two acceleration configurations: 1) switch point k = 5, cache point r = 3 and skip point s = 3; 2) k = 10, r = 4 and s = 6... Regarding image quality, we follow prior works to evaluate the visual quality using Frechet Inception Distance (FID) [12] and Inception Score (IS) [37]. We assess text-image alignment using CLIP score [11]... The FLOPs and latency are measured for a batch size of 4. We run each experiment ten times and report the average results.