Cocktail: Mixing Multi-Modality Control for Text-Conditional Image Generation
Authors: Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, Dacheng Tao, Tat-Jen Cham
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we delve into a comprehensive experimental analysis to validate the efficacy and superiority of the proposed method through ablation studies and application demonstrations. Subsequently, in Sec. 4.1, we put forth both quantitative and qualitative results, elucidating the comparative advantages of our approach. |
| Researcher Affiliation | Collaboration | Nanyang Technological University, South China University of Technology, University of Oxford, The University of Sydney, JD Explore Academy |
| Pseudocode | No | The paper does not contain a dedicated 'Pseudocode' or 'Algorithm' section, nor does it present any structured algorithm blocks. |
| Open Source Code | Yes | The codes are released at https://mhh0318.github.io/cocktail/. |
| Open Datasets | Yes | All of our experiments are performed on LAION-AESTHETICS-6.5 dataset, which contains about 600K image-text pairs with predicted aesthetics scores of higher than 6.5. |
| Dataset Splits | No | The paper states that experiments are performed on the 'LAION-AESTHETICS-6.5 dataset' and evaluates on 'COCO5k validation set' and 'COCO validation set'. However, it does not provide specific numerical percentages or sample counts for the training, validation, and test splits used on the LAION-AESTHETICS-6.5 dataset, which is necessary for reproducing the experiment's data partitioning. |
| Hardware Specification | Yes | trained for 20 epochs with a batch size of 64 on 4 NVIDIA 80G-A100 GPUs within 4 days. |
| Software Dependencies | No | The paper mentions 'Stable Diffusion v2.1' as the base model and 'Adam W optimizer' and 'DDIM sampler' as components, but it does not provide specific version numbers for underlying software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA) required to reproduce the experiment. |
| Experiment Setup | Yes | g Control Net is adapted from the pretrained Stable Diffusion v2.1 in this paper and trained for 20 epochs with a batch size of 64 on 4 NVIDIA 80G-A100 GPUs within 4 days. We use the Adam W optimizer with a learning rate of 3.0 e-05. All the training images in the LAION-AESTHETICS-6.5 are first resized to 512 by the short side and then randomly cropped to 512 512. During inference, the sampler is DDIM, the sampling steps are 50, and the classifier-free guidance scale is 9.0 by default. |