Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Visual Instruction Inversion: Image Editing via Image Prompting
Authors: Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our approach against both image-editing and visual prompting frameworks, on both synthetic and real images. In Section 4.2, we present qualitative results, followed by a quantitative comparison in Section 4.3. Both quantitative and qualitative results demonstrate that our approach not only achieves competitive performance to state-of-the-art models, but also has additional merits in specific cases. |
| Researcher Affiliation | Academia | Thao Nguyen Yuheng Li Utkarsh Ojha Yong Jae Lee University of Wisconsin-Madison |
| Pseudocode | Yes | Algorithm 1 Visual Instruction Inversion (VISII) |
| Open Source Code | No | The paper provides a project webpage (https://thaoshibe.github.io/visii/) but no explicit statement about open-sourcing the code or a direct link to a code repository. |
| Open Datasets | Yes | We randomly sampled images from the Clean-Instruct Pix2Pix dataset [4], which consists of synthetic paired before-after images with corresponding descriptions. |
| Dataset Splits | No | The paper mentions total image pair counts used for evaluation but does not specify explicit training, validation, and test splits with percentages or sample counts for reproducibility. |
| Hardware Specification | Yes | All experiments are conducted on a 4 NVIDIA RTX 3090 machine. |
| Software Dependencies | No | The paper mentions software components like 'pretrained clip-vit-large-patch14' and 'Instruct Pix2Pix' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We use the frozen pretrained Instruct Pix2Pix [4] to optimize the instruction c T for N = 1000 steps, T = 1000 timesteps. We use Adam W optimizer [25] with learning rate γ = 0.001, λmse = 4, and λclip = 0.1. Text guidance and image guidance scores are set at their default value of 7.5 and 1.5, respectively. |