StyleDrop: Text-to-Image Synthesis of Any Style
Authors: Kihyuk Sohn, Lu Jiang, Jarred Barber, Kimin Lee, Nataniel Ruiz, Dilip Krishnan, Huiwen Chang, Yuanzhen Li, Irfan Essa, Michael Rubinstein, Yuan Hao, Glenn Entis, Irina Blok, Daniel Castro Chin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments (Fig. 1) show that Style Drop achieves unprecedented accuracy and fidelity in stylized image synthesis. |
| Researcher Affiliation | Collaboration | Kihyuk Sohn Nataniel Ruiz Kimin Lee Daniel Castro Chin Irina Blok Huiwen Chang Jarred Barber Lu Jiang Glenn Entis Yuanzhen Li Yuan Hao Irfan Essa Michael Rubinstein Dilip Krishnan Google Research. Now at Korea Advanced Institute of Science and Technology (KAIST). Now at Open AI. |
| Pseudocode | Yes | An example code explaining how to apply an adapter to the output of an attention layer and how to generate adapter weights are in Fig. S1. |
| Open Source Code | Yes | More results are available at our project website: https://styledrop.github.io. |
| Open Datasets | Yes | We provide image sources in Tab. S1 and attribute their ownership. |
| Dataset Splits | No | The paper mentions training steps and batch size, but does not provide explicit train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | Note that we use the batch size of 8, 1 per core of TPU v3, but Style Drop can be also optimized on a single GPU (e.g., A100) with batch size of 1. |
| Software Dependencies | No | The paper mentions various software components and models like 'Muse [5]', 'Adam optimizer [19]', 'T5-XXL [30] encoder', 'VQGAN [10, 42]', 'CLIP [29]', 'Dream Booth [34]', 'Imagen [35]', 'Stable Diffusion [33]', and 'Lo RA [16]', but it does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | For all experiments, we update adapter weights for 1000 steps using Adam optimizer [19] with a learning rate of 0.00003. We provide in Tab. S3 hyperparameters for optimizer, adapter architecture, and synthesis. |