Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MLEP: Multi-granularity Local Entropy Patterns for Generalized AI-generated Image Detection

Authors: Lin Yuan, Xiaowan Li, Yan Zhang, Jiawei Zhang, Hongbo Li, Xinbo Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in an open-world setting, involving images synthesized by 32 distinct generative models, demonstrate that our approach achieves substantial improvements over state-of-the-art methods in both accuracy and generalization.
Researcher Affiliation Academia Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing 400065, China EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the approach using text and figures (e.g., Figure 2 illustrates the method's core steps), but it does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes Our code and models are available at https://www.github.com/fkeufss/MLEP/.
Open Datasets Yes We adopt the cross-dataset setup from [7], using the Foren Synths [5] dataset for training... The GAN-Set includes Pro GAN [24], Style GAN [26], Style GAN2 [27], Big GAN [28], Cycle GAN [29], Star GAN [30], Gau GAN [31], Att GAN [32], BEGAN [33], Cramer GAN [34], Info Max GAN [35], MMDGAN [36], Rel GAN [37], S3GAN [38], SNGAN [39], and STGAN [40], with the former seven obtained from the dataset Foren Synths [5] and the latter nine from the dataset GANGen-Detection [41]. The Diffusion-Set contains DDPM [2], IDDPM [42], ADM [43], LDM [44], PNDM [45], VQ-Diffusion [46], Stable Diffusion (SD) v1/v2 [44], DALL E mini [47], three Glide [48] variants2, and two LDM [44] variants3. Of these models, the first eight are sourced from the Diffusion Forensics dataset [13], while the remainder are from the Universal Fake Detect dataset [15]. Furthermore, we include images from two commercial models, Midjourney and DALL E 2, sourced from the social platform Discord4 as provided by [7]. ... an additional experiment using Diffusion DB[51]
Dataset Splits No The paper mentions: "We adopt the cross-dataset setup from [7], using the Foren Synths [5] dataset for training, which includes 20 content categories with 18,000 Pro GAN [24] generated images and an equal number of real images from LSUN [25]. Following [7], we train only on four categories: cars, cats, chairs, and horses, posing a challenging cross-scene setting. ... All images were resized to 224 224, with random cropping for training and center cropping for testing." While it describes the datasets used for training and testing, and preprocessing steps, it does not provide specific train/validation/test percentages or sample counts for its own experiments, nor does it explicitly state the use of standard splits for the Foren Synths dataset beyond adopting a "cross-dataset setup from [7]".
Hardware Specification Yes All experiments ran on a server with two NVIDIA RTX A5000 GPUs.
Software Dependencies No The paper mentions using the Adam optimizer and ResNet-50 as a backbone classifier, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes The training was performed using the Adam optimizer (learning rate of 0.002, batch size of 64). All experiments ran on a server with two NVIDIA RTX A5000 GPUs.