Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies

Authors: Shuqiao Liang, Jian Liu, Chen Renzhang, Quanlong Guan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Ferret Net trained exclusively on the 4class Pro GAN dataset achieves an average accuracy of 97.1% on an open-world benchmark comprising 22 generative models. Our code and datasets are publicly available at https://github.com/xigua7105/Ferret Net. 5 Experiments 5.1 Dataset Construction 5.2 Implementation Details 5.3 Main Results 5.4 Ablation Study
Researcher Affiliation	Academia	Shuqiao Liang Jian Liu Renzhang Chen Quanlong Guan Jinan University EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Local Dependency Feature Extraction via Zero-Masked Median Deviation Input: I: Image tensor of shape (C, H, W); n: Neighborhood size (n is odd) Output: LPD: Feature map of shape (C, H, W)
Open Source Code	Yes	Our code and datasets are publicly available at https://github.com/xigua7105/Ferret Net.
Open Datasets	Yes	Our code and datasets are publicly available at https://github.com/xigua7105/Ferret Net. We present Synthetic-Pop, a 60K-image benchmark for evaluating detection models against high-fidelity generators, containing 30K synthetic images from six models and 30K real images from COCO [25] and LAION-Aesthetics V2 (6.5+) [45]. See Appendix A for more details.
Dataset Splits	Yes	5.1.1 Training Dataset To ensure a consistent evaluation baseline, we follow the protocols established in [16, 31, 27, 50], utilizing four semantic classes (car, cat, chair, horse) from the Foren Synths dataset [51]. Each class contains 18,000 synthetic images generated by Pro GAN [19], paired with an equal number of real images from the LSUN dataset [56]. All methods compared in this study were trained or fine-tuned on this same limited Pro GAN 4-class dataset, except for CO-SPY [2], which utilized its officially released weights trained on other datasets. 5.1.2 Testing Dataset To assess the generalization ability of the proposed method under real-world conditions, we evaluate its performance on diverse synthetic and real images from four distinct test sets, comprising a total of 22 generative models: Foren Synths. ... totaling 62,000 images. Diffusion-6-cls. ... Each subset includes 1,000 synthetic and 1,000 real images, with some real images reused across subsets. Synthetic-Pop. ... resulting in six subsets, each containing 5,000 synthetic and 5,000 real images (60,000 images total). Synthetic-Aesthetic. ... An equal number of real images were sampled from LAION-Aesthetics V2 (6.5+) [45], resulting in a total of 80,000 images.
Hardware Specification	Yes	To measure real-world performance, we report throughput on the Synthetic-Aesthetic test set using an NVIDIA RTX 4090 GPU and an Intel(R) Xeon(R) Gold 6430 CPU (16 v CPUs), with a batch size of 128.
Software Dependencies	No	No specific software versions (e.g., Python, PyTorch, TensorFlow versions) are mentioned in the paper.
Experiment Setup	Yes	Ferret Net is trained from scratch without any pretraining. We use the Adam optimizer with a learning rate of 2 10 4, betas of (0.937, 0.999), and a weight decay of 5 10 4. The model is trained for 100 epochs using a batch size of 32. During training, input images are randomly cropped to a resolution of 224 224 and augmented with random horizontal flipping. Binary Cross Entropy with Logits Loss (BCEWith Logits Loss) is adopted as the loss function. For evaluation, images are center-cropped to 256 256.