Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

Authors: Siwei Wen, junyan ye, Peilin Feng, Hengrui Kang, Zichen Wen, Yize Chen, Jiang Wu, wenjun wu, Conghui He, Weijia Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations across multiple datasets confirm the superiority of Fake VLM in both authenticity classification and artifact explanation tasks, setting a new benchmark for synthetic image detection. The code, model weights, and dataset can be found here: https://github.com/opendatalab/Fake VLM. 1 Introduction As AI-generated content technologies advance, synthetic images are increasingly integrated into our daily lives [1, 2, 3, 4]. ... 5 Experiment In this section, we introduce three additional datasets used in the experiments, alongside Fake Clue, and describe our experimental setup. We then present Fake VLM s performance on general synthetic and Deep Fake detection tasks, as well as its ability to explain image artifacts. Finally, we conduct ablation studies and further exploratory experiments to assess the model s performance.
Researcher Affiliation	Academia	1Shanghai Artificial Intelligence Laboratory, 2Sun Yat-Sen University, 3Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, 4Shanghai Jiao Tong University, 5The Chinese University of Hong Kong, Shenzhen
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It includes diagrams like Figure 1 ('Construction pipeline of Fake Clue dataset') and Figure 2 ('Overview of Fake VLM') but no textual pseudocode.
Open Source Code	Yes	The code, model weights, and dataset can be found here: https://github.com/opendatalab/Fake VLM.
Open Datasets	Yes	Additionally, we present Fake Clue, a comprehensive dataset containing over 100,000 images across seven categories, annotated with fine-grained artifact clues in natural language. ... The code, model weights, and dataset can be found here: https://github.com/opendatalab/Fake VLM. ... For open synthetic datasets, we extracted approximately 80K data from Gen Image [54], FF++ [55] and Chameleon [56], maintaining a 1:1 ratio of fake to real data.
Dataset Splits	Yes	Training/test sets are randomly split; the test set contains 5,000 diverse image samples. Detailed dataset information is provided in the supplementary materials. ... For FF++ and DD-VQA, we use their default training-test splits for evaluation.
Hardware Specification	Yes	The training is conducted for two epochs on eight NVIDIA A100 GPUs with a batch size of 32 per GPU using a 2e-5 learning rate with 3% linear warmup and cosine decay.
Software Dependencies	No	The paper mentions using LLaVA-1.5 7B and Vicuna-v1.5-7B as base models, but it does not provide specific version numbers for ancillary software components like programming languages, libraries, or frameworks (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	The training is conducted for two epochs on eight NVIDIA A100 GPUs with a batch size of 32 per GPU using a 2e-5 learning rate with 3% linear warmup and cosine decay. This full fine-tuning adapted the model to synthetic data detection/explanation nuances while preserving its general instruction-following capabilities.