CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
Authors: Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical studies show that Cycle Net is superior in translation consistency and quality, and can generate high-quality images for out-of-domain distributions with a simple change of the textual prompt. The empirical results demonstrate that compared to previous approaches, Cycle Net is superior in translation faithfulness, cycle consistency, and image quality. |
| Researcher Affiliation | Collaboration | 1University of Michigan, 2LG AI Research |
| Pseudocode | Yes | The pseudocode for training is given in Algo. 1. |
| Open Source Code | Yes | Our code is available at https://github.com/sled-group/CycleNet. |
| Open Datasets | Yes | Additionally, we introduce Mani Cups1, a dataset of state-level image manipulation that tasks models to manipulate cups by filling or emptying liquid to/from containers... Our data is available at https://huggingface.co/datasets/sled-umich/ManiCups. |
| Dataset Splits | No | Table 3: The statistics of the Mani Cups dataset, with 3 abundant domains and 2 lowresource domains. Table 4: The statistics of the Yosemite summer winter, horse zebra, and apple orange datasets. (These tables show Train and Test splits, but no explicit Validation split details beyond a monitor metric) |
| Hardware Specification | Yes | We train our model with a batch size of 4 on only one single A40 GPU. |
| Software Dependencies | No | The paper lists versions for specific models (e.g., Stable Diffusion v1.5, v2.1) but does not provide specific version numbers for general ancillary software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | In the training of Cycle Net, the weights of our three loss functions are respectively set as λ1 = 1, λ2 = 0.1, and λ3 = 0.01. We train the model for 50k steps. ... Our configuration is as follows: model: params: linear_start: 0.00085 linear_end: 0.0120 num_timesteps_cond: 1 timesteps: 1000 image_size: 64 channels: 4 cond_stage_trainable: false monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: False only_mid_control: False recon_weight: 1 #lambda1 disc_weight: 0.1 #lambda2 cycle_weight: 0.01 #lambda3 disc_mode: eps consis_weight: 0.1 |