Visual Instruction Inversion: Image Editing via Image Prompting

Authors: Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our approach against both image-editing and visual prompting frameworks, on both synthetic and real images. In Section 4.2, we present qualitative results, followed by a quantitative comparison in Section 4.3. Both quantitative and qualitative results demonstrate that our approach not only achieves competitive performance to state-of-the-art models, but also has additional merits in specific cases.
Researcher Affiliation Academia Thao Nguyen Yuheng Li Utkarsh Ojha Yong Jae Lee University of Wisconsin-Madison
Pseudocode Yes Algorithm 1 Visual Instruction Inversion (VISII)
Open Source Code No The paper provides a project webpage (https://thaoshibe.github.io/visii/) but no explicit statement about open-sourcing the code or a direct link to a code repository.
Open Datasets Yes We randomly sampled images from the Clean-Instruct Pix2Pix dataset [4], which consists of synthetic paired before-after images with corresponding descriptions.
Dataset Splits No The paper mentions total image pair counts used for evaluation but does not specify explicit training, validation, and test splits with percentages or sample counts for reproducibility.
Hardware Specification Yes All experiments are conducted on a 4 NVIDIA RTX 3090 machine.
Software Dependencies No The paper mentions software components like 'pretrained clip-vit-large-patch14' and 'Instruct Pix2Pix' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We use the frozen pretrained Instruct Pix2Pix [4] to optimize the instruction c T for N = 1000 steps, T = 1000 timesteps. We use Adam W optimizer [25] with learning rate γ = 0.001, λmse = 4, and λclip = 0.1. Text guidance and image guidance scores are set at their default value of 7.5 and 1.5, respectively.