Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MLEP: Multi-granularity Local Entropy Patterns for Generalized AI-generated Image Detection

Authors: Lin Yuan, Xiaowan Li, Yan Zhang, Jiawei Zhang, Hongbo Li, Xinbo Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments in an open-world setting, involving images synthesized by 32 distinct generative models, demonstrate that our approach achieves substantial improvements over state-of-the-art methods in both accuracy and generalization.
Researcher Affiliation	Academia	Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the approach using text and figures (e.g., Figure 2 illustrates the method's core steps), but it does not include a formal pseudocode block or algorithm listing.
Open Source Code	Yes	Our code and models are available at https://www.github.com/fkeufss/MLEP/.
Open Datasets	Yes	We adopt the cross-dataset setup from [7], using the Foren Synths [5] dataset for training... The GAN-Set includes Pro GAN [24], Style GAN [26], Style GAN2 [27], Big GAN [28], Cycle GAN [29], Star GAN [30], Gau GAN [31], Att GAN [32], BEGAN [33], Cramer GAN [34], Info Max GAN [35], MMDGAN [36], Rel GAN [37], S3GAN [38], SNGAN [39], and STGAN [40], with the former seven obtained from the dataset Foren Synths [5] and the latter nine from the dataset GANGen-Detection [41]. The Diffusion-Set contains DDPM [2], IDDPM [42], ADM [43], LDM [44], PNDM [45], VQ-Diffusion [46], Stable Diffusion (SD) v1/v2 [44], DALL E mini [47], three Glide [48] variants2, and two LDM [44] variants3. Of these models, the first eight are sourced from the Diffusion Forensics dataset [13], while the remainder are from the Universal Fake Detect dataset [15]. Furthermore, we include images from two commercial models, Midjourney and DALL E 2, sourced from the social platform Discord4 as provided by [7]. ... an additional experiment using Diffusion DB[51]
Dataset Splits	No	The paper mentions: "We adopt the cross-dataset setup from [7], using the Foren Synths [5] dataset for training, which includes 20 content categories with 18,000 Pro GAN [24] generated images and an equal number of real images from LSUN [25]. Following [7], we train only on four categories: cars, cats, chairs, and horses, posing a challenging cross-scene setting. ... All images were resized to 224 224, with random cropping for training and center cropping for testing." While it describes the datasets used for training and testing, and preprocessing steps, it does not provide specific train/validation/test percentages or sample counts for its own experiments, nor does it explicitly state the use of standard splits for the Foren Synths dataset beyond adopting a "cross-dataset setup from [7]".
Hardware Specification	Yes	All experiments ran on a server with two NVIDIA RTX A5000 GPUs.
Software Dependencies	No	The paper mentions using the Adam optimizer and ResNet-50 as a backbone classifier, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup	Yes	The training was performed using the Adam optimizer (learning rate of 0.002, batch size of 64). All experiments ran on a server with two NVIDIA RTX A5000 GPUs.