reproducibilityindex.ai

Visual Instruction Inversion: Image Editing via Image Prompting

Authors: Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our approach against both image-editing and visual prompting frameworks, on both synthetic and real images. In Section 4.2, we present qualitative results, followed by a quantitative comparison in Section 4.3. Both quantitative and qualitative results demonstrate that our approach not only achieves competitive performance to state-of-the-art models, but also has additional merits in specific cases.
Researcher Affiliation	Academia	Thao Nguyen Yuheng Li Utkarsh Ojha Yong Jae Lee University of Wisconsin-Madison
Pseudocode	Yes	Algorithm 1 Visual Instruction Inversion (VISII)
Open Source Code	No	The paper provides a project webpage (https://thaoshibe.github.io/visii/) but no explicit statement about open-sourcing the code or a direct link to a code repository.
Open Datasets	Yes	We randomly sampled images from the Clean-Instruct Pix2Pix dataset [4], which consists of synthetic paired before-after images with corresponding descriptions.
Dataset Splits	No	The paper mentions total image pair counts used for evaluation but does not specify explicit training, validation, and test splits with percentages or sample counts for reproducibility.
Hardware Specification	Yes	All experiments are conducted on a 4 NVIDIA RTX 3090 machine.
Software Dependencies	No	The paper mentions software components like 'pretrained clip-vit-large-patch14' and 'Instruct Pix2Pix' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We use the frozen pretrained Instruct Pix2Pix [4] to optimize the instruction c T for N = 1000 steps, T = 1000 timesteps. We use Adam W optimizer [25] with learning rate γ = 0.001, λmse = 4, and λclip = 0.1. Text guidance and image guidance scores are set at their default value of 7.5 and 1.5, respectively.