CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

Authors: Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical studies show that Cycle Net is superior in translation consistency and quality, and can generate high-quality images for out-of-domain distributions with a simple change of the textual prompt. The empirical results demonstrate that compared to previous approaches, Cycle Net is superior in translation faithfulness, cycle consistency, and image quality.
Researcher Affiliation Collaboration 1University of Michigan, 2LG AI Research
Pseudocode Yes The pseudocode for training is given in Algo. 1.
Open Source Code Yes Our code is available at https://github.com/sled-group/CycleNet.
Open Datasets Yes Additionally, we introduce Mani Cups1, a dataset of state-level image manipulation that tasks models to manipulate cups by filling or emptying liquid to/from containers... Our data is available at https://huggingface.co/datasets/sled-umich/ManiCups.
Dataset Splits No Table 3: The statistics of the Mani Cups dataset, with 3 abundant domains and 2 lowresource domains. Table 4: The statistics of the Yosemite summer winter, horse zebra, and apple orange datasets. (These tables show Train and Test splits, but no explicit Validation split details beyond a monitor metric)
Hardware Specification Yes We train our model with a batch size of 4 on only one single A40 GPU.
Software Dependencies No The paper lists versions for specific models (e.g., Stable Diffusion v1.5, v2.1) but does not provide specific version numbers for general ancillary software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes In the training of Cycle Net, the weights of our three loss functions are respectively set as λ1 = 1, λ2 = 0.1, and λ3 = 0.01. We train the model for 50k steps. ... Our configuration is as follows: model: params: linear_start: 0.00085 linear_end: 0.0120 num_timesteps_cond: 1 timesteps: 1000 image_size: 64 channels: 4 cond_stage_trainable: false monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: False only_mid_control: False recon_weight: 1 #lambda1 disc_weight: 0.1 #lambda2 cycle_weight: 0.01 #lambda3 disc_mode: eps consis_weight: 0.1