Image Translation as Diffusion Visual Programmers

Authors: Cheng Han, James Chenhao Liang, Qifan Wang, MAJID RABBANI, Sohail Dianat, Raghuveer Rao, Ying Nian Wu, Dongfang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS 4.1 IMPLEMENTATION DETAILS Benchmarks. For quantitative and qualitative results, we conduct a new benchmark (see E), consisting of 100 diverse text-image pairs. ... Evaluation Metrics. We follow (Ruiz et al., 2023; Chen et al., 2023), and calculate the CLIP-Score (Hessel et al., 2021) and DINO-Score (Caron et al., 2021). ... 4.2 COMPARISONS WITH CURRENT METHODS Qualitative Results. ... Quantitative Comparisons. ...
Researcher Affiliation Collaboration Rochester Institute of Technology1, University of Missouri Kansas City2, Meta AI3, DEVCOM Army Research Laboratory4, University of California, Los Angeles5
Pseudocode Yes A IMPLEMENTATION DETAILS AND PSEUDO-CODE OF DVP ... Algorithm 1: Condition-flexible diffusion model inversion ... Pseudo-code 1: Pseudo-code of instance normalization used in condition-flexible diffusion model in a Py Torch-like style. ... Pseudo-code 2: Pseudo-code of in-context visual programming in a Py Torch-like style.
Open Source Code Yes Our demo page is released at here. To further guarantee reproducibility, our full implementation and code are publicly released.
Open Datasets Yes For quantitative and qualitative results, we conduct a new benchmark (see E), consisting of 100 diverse text-image pairs. Specifically, we manually pick images from web, generated images, Image Net-R (Hendrycks et al., 2021), Image Net (Russakovsky et al., 2015), MS COCO (Lin et al., 2014), and other previous work (Ruiz et al., 2023; Tumanyan et al., 2023).
Dataset Splits No The paper describes the datasets used and evaluation metrics, but does not provide specific training, validation, and test dataset splits for reproduction.
Hardware Specification Yes Experiments are conducted on one NVIDIA TESLA A100-80GB SXM GPU.
Software Dependencies Yes Our work is implemented in Pytorch (Paszke et al., 2019). ... We choose the GPT-4 (Open AI, 2023) as our Planner discussed in 3.2, utilizing the official Open AI Python API. ... Furthermore, we utilize Mask2Former (Cheng et al., 2022) as a segmenter, Repaint (Lugmayr et al., 2022) as an inpainter, and BLIP (Li et al., 2022) as a prompter.
Experiment Setup Yes For all experiments, we utilize diffusion model (Song et al., 2021) deterministic sampling with 50 steps. ... We choose Adam W (Loshchilov & Hutter, 2019) optimizer with an initial learning rate of 1e-5, betas = (0.9, 0.999), eps = 1e-8 and weight decays of 0.01 as default. ... The maximum length for generated programs is set to 256, and the temperature is set to 0 for the most deterministic generation.