Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
IDseq: Decoupled and Sequentially Detecting and Grounding Multi-Modal Media Manipulation
Authors: Runxin Liu, Tian Xie, Jiaming Li, Lingyun Yu, Hongtao Xie
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show the superiority of our IDseq, where it notably outperforms SOTA methods on the fine-grained classification by 3.8% in m AP and the forgery face grounding by 8.7% in Io Umean, even 1.3% in F1 on the most challenging manipulated text grounding. ... We conduct experiments on the DGM4 dataset (Shao, Wu, and Liu 2023), which comprises 230,000 image-text paired samples... Evaluation Metric. We report our results following the original evaluation protocols and metrics (Shao, Wu, and Liu 2023). |
| Researcher Affiliation | Academia | 1 University of Science and Technology of China, Hefei, China 2 Anhui University, Hefei, China EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using text, mathematical formulations, and diagrams (Figure 3, 4, 5) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on the DGM4 dataset (Shao, Wu, and Liu 2023), which comprises 230,000 image-text paired samples, including over 77,000 pristine pairs and 152,000 manipulated pairs. |
| Dataset Splits | No | We train our IDseq on the training set and evaluate its performance on the test set. ... The input images are resized into 224 × 224, and the text sequence is padded with a max length of 50 for both training and testing. The paper mentions training and test sets but does not provide specific percentages or sample counts for these splits. |
| Hardware Specification | Yes | The model is trained on four Nvidia A40 GPUs with batch size 128 for 50 epochs. |
| Software Dependencies | No | We implement our model on Py Torch (Paszke et al. 2019). The paper mentions PyTorch as the framework but does not specify a version number or list other key software components with their versions. |
| Experiment Setup | Yes | The initial learning rates for encoders and the others are set to 1e-5 and 1e-4 under a cosine schedule. The model is trained on four Nvidia A40 GPUs with batch size 128 for 50 epochs. The input images are resized into 224 × 224, and the text sequence is padded with a max length of 50 for both training and testing. ... where λ1 = 1, λ2 = 1, λ3 = 0.1 and λ4 = 1, λ5 = 0.1, λ6 = 0.1, following the hyperparameter settings of the baseline (Shao, Wu, and Liu 2023). |