Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

IDseq: Decoupled and Sequentially Detecting and Grounding Multi-Modal Media Manipulation

Authors: Runxin Liu, Tian Xie, Jiaming Li, Lingyun Yu, Hongtao Xie

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show the superiority of our IDseq, where it notably outperforms SOTA methods on the fine-grained classification by 3.8% in m AP and the forgery face grounding by 8.7% in Io Umean, even 1.3% in F1 on the most challenging manipulated text grounding. ... We conduct experiments on the DGM4 dataset (Shao, Wu, and Liu 2023), which comprises 230,000 image-text paired samples... Evaluation Metric. We report our results following the original evaluation protocols and metrics (Shao, Wu, and Liu 2023).
Researcher Affiliation	Academia	1 University of Science and Technology of China, Hefei, China 2 Anhui University, Hefei, China EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using text, mathematical formulations, and diagrams (Figure 3, 4, 5) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets	Yes	We conduct experiments on the DGM4 dataset (Shao, Wu, and Liu 2023), which comprises 230,000 image-text paired samples, including over 77,000 pristine pairs and 152,000 manipulated pairs.
Dataset Splits	No	We train our IDseq on the training set and evaluate its performance on the test set. ... The input images are resized into 224 × 224, and the text sequence is padded with a max length of 50 for both training and testing. The paper mentions training and test sets but does not provide specific percentages or sample counts for these splits.
Hardware Specification	Yes	The model is trained on four Nvidia A40 GPUs with batch size 128 for 50 epochs.
Software Dependencies	No	We implement our model on Py Torch (Paszke et al. 2019). The paper mentions PyTorch as the framework but does not specify a version number or list other key software components with their versions.
Experiment Setup	Yes	The initial learning rates for encoders and the others are set to 1e-5 and 1e-4 under a cosine schedule. The model is trained on four Nvidia A40 GPUs with batch size 128 for 50 epochs. The input images are resized into 224 × 224, and the text sequence is padded with a max length of 50 for both training and testing. ... where λ1 = 1, λ2 = 1, λ3 = 0.1 and λ4 = 1, λ5 = 0.1, λ6 = 0.1, following the hyperparameter settings of the baseline (Shao, Wu, and Liu 2023).