Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing

Authors: Ziqi Jiang, Zhen Wang, Long Chen

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that CLIPDrag outperforms existing single drag-based methods or text-based methods. ... To show the performance of CLIPDrag we compared both drag-based methods (Drag Diffusion, Free Drag, Region Drag, Stable Drag, Instant Drag, Lightning Drag), and text-based method (Diff CLIP) on text-drag image editing tasks. ... All input images are from the DRAGBENCH datasets (Shi et al., 2024b). ... Quantitative results are shown in Figure 6(b).
Researcher Affiliation	Academia	Ziqi Jiang, Zhen Wang, Long Chen The Hong Kong University of Science and Technology EMAIL, EMAIL
Pseudocode	No	The paper describes methods and processes like 'Global-Local Motion Supervision' and 'Fast Point Tracking' in detailed text and mathematical formulations (e.g., equations 1-7), but it does not present any explicitly labeled pseudocode blocks or algorithms in a structured, code-like format.
Open Source Code	Yes	Codes: https://github.com/Zi Qi-Jiang/CLIPDrag.
Open Datasets	Yes	All input images are from the DRAGBENCH datasets (Shi et al., 2024b).
Dataset Splits	No	The paper mentions using 'DRAGBENCH datasets' and comparing methods 'on the DRAGBENCH benchmark with five different max iteration step settings,' but it does not provide specific details about training, validation, or test splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification	Yes	the result is calculated on a single 3090 GPU by averaging over 100 examples sampled from the Drag Bench.
Software Dependencies	Yes	We used Stable Diffusion 1.5 (Rombach et al., 2022) and CLIP-Vi T-B/16 (Dosovitskiy et al., 2020) as the base model.
Experiment Setup	Yes	For the Lo RA finetuning stage, we set the training steps as 80, and the rank as 16 with a small learning rate of 0.0005. In the DDIM inversion, we set the inversion strength to 0.7 and the total denoising steps to 50. In the Motion supervision, we had a large maximum optimization step of 2000, ensuring handles could reach the targets. The features were extracted from the last layer of the U-Net. The radius for motion supervision (r1) and point tracking (r2) were set to 4 and 12, respectively. The weight λ in the Global-Local Gradient Fusion process was 0.7.