Image Translation as Diffusion Visual Programmers
Authors: Cheng Han, James Chenhao Liang, Qifan Wang, MAJID RABBANI, Sohail Dianat, Raghuveer Rao, Ying Nian Wu, Dongfang Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS 4.1 IMPLEMENTATION DETAILS Benchmarks. For quantitative and qualitative results, we conduct a new benchmark (see E), consisting of 100 diverse text-image pairs. ... Evaluation Metrics. We follow (Ruiz et al., 2023; Chen et al., 2023), and calculate the CLIP-Score (Hessel et al., 2021) and DINO-Score (Caron et al., 2021). ... 4.2 COMPARISONS WITH CURRENT METHODS Qualitative Results. ... Quantitative Comparisons. ... |
| Researcher Affiliation | Collaboration | Rochester Institute of Technology1, University of Missouri Kansas City2, Meta AI3, DEVCOM Army Research Laboratory4, University of California, Los Angeles5 |
| Pseudocode | Yes | A IMPLEMENTATION DETAILS AND PSEUDO-CODE OF DVP ... Algorithm 1: Condition-flexible diffusion model inversion ... Pseudo-code 1: Pseudo-code of instance normalization used in condition-flexible diffusion model in a Py Torch-like style. ... Pseudo-code 2: Pseudo-code of in-context visual programming in a Py Torch-like style. |
| Open Source Code | Yes | Our demo page is released at here. To further guarantee reproducibility, our full implementation and code are publicly released. |
| Open Datasets | Yes | For quantitative and qualitative results, we conduct a new benchmark (see E), consisting of 100 diverse text-image pairs. Specifically, we manually pick images from web, generated images, Image Net-R (Hendrycks et al., 2021), Image Net (Russakovsky et al., 2015), MS COCO (Lin et al., 2014), and other previous work (Ruiz et al., 2023; Tumanyan et al., 2023). |
| Dataset Splits | No | The paper describes the datasets used and evaluation metrics, but does not provide specific training, validation, and test dataset splits for reproduction. |
| Hardware Specification | Yes | Experiments are conducted on one NVIDIA TESLA A100-80GB SXM GPU. |
| Software Dependencies | Yes | Our work is implemented in Pytorch (Paszke et al., 2019). ... We choose the GPT-4 (Open AI, 2023) as our Planner discussed in 3.2, utilizing the official Open AI Python API. ... Furthermore, we utilize Mask2Former (Cheng et al., 2022) as a segmenter, Repaint (Lugmayr et al., 2022) as an inpainter, and BLIP (Li et al., 2022) as a prompter. |
| Experiment Setup | Yes | For all experiments, we utilize diffusion model (Song et al., 2021) deterministic sampling with 50 steps. ... We choose Adam W (Loshchilov & Hutter, 2019) optimizer with an initial learning rate of 1e-5, betas = (0.9, 0.999), eps = 1e-8 and weight decays of 0.01 as default. ... The maximum length for generated programs is set to 256, and the temperature is set to 0 for the most deterministic generation. |