Stylus: Automatic Adapter Selection for Diffusion Models
Authors: Michael Luo, Justin Wong, Brandon Trabucco, Yanping Huang, Joseph E. Gonzalez, zhifeng Chen, Ruslan Salakhutdinov, Ion Stoica
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate Stylus, we developed Stylus Docs, a curated dataset featuring 75K adapters with pre-computed adapter embeddings. In our evaluation on popular Stable Diffusion checkpoints, Stylus achieves greater CLIP/FID Pareto efficiency and is twice as preferred, with humans and multimodal models as evaluators, over the base model. |
| Researcher Affiliation | Collaboration | Michael Luo1 Justin Wong1 Brandon Trabucco2 Yanping Huang3 Joseph E. Gonzalez1 Zhifeng Chen3 Ruslan Salakhutdinov2 Ion Stoica1 1UC Berkeley 2CMU MLD 3Google Deepmind |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | Stylus is already open-source, with all experiments made available to individuals online on Github. |
| Open Datasets | Yes | We evaluate Stylus over a cross product of two datasets, Microsoft COCO [22] and Parti Prompts [53] |
| Dataset Splits | Yes | We evaluate COCO 2014 validation dataset, with 10K sampled prompts |
| Hardware Specification | Yes | We launched 16 replicas of Stylus and Stable Diffusion on 8 A100-80GB GPUs for 4 weeks to generate images for evaluation. |
| Software Dependencies | Yes | We assess Stylus against Stable-Diffusion-v1.5 [40] as the baseline model. ... In our experiments, these improved descriptions were generated by Gemini Ultra [43] ... In our experiments, we create embeddings from Open AI s text-embedding-3-large model [21, 30]. ... In our implementation, we choose Gemini 1.5, with a 128K context window, as the composer s LLM |
| Experiment Setup | Yes | Our image generation process integrates directly with Stable-Diffusion Web UI [1] and defaults to 35 denoising steps using the default DPM Solver++ scheduler [26]. To replicate high-quality images from existing users, we enable high-resolution upscaling to generate 1024x1024 from 512x512 images, with the default latent upscaler [17] and denoising strength set to 0.7. |