Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics
Authors: Anton Voronov, Mikhail Khoroshikh, Artem Babenko, Max Ryabinin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on Stable Diffusion for 48 different concepts and three personalization methods demonstrate the competitive performance of our approach, which makes adaptation up to 8 times faster with no significant drops in quality. |
| Researcher Affiliation | Collaboration | Anton Voronov MIPT, Yandex Mikhail Khoroshikh HSE University, Yandex Artem Babenko HSE University, Yandex Max Ryabinin HSE University, Yandex |
| Pseudocode | Yes | def DVAR(losses, window_size, threshold): running_var = losses[-window_size:].var() total_var = losses.var() ratio = running_var / total_var return ratio < threshold |
| Open Source Code | Yes | The code of our experiments is available at github.com/yandex-research/DVAR. |
| Open Datasets | Yes | For evaluation, we combine the datasets published by authors of the three techniques above that were available as of March 2023, which results in a total of 48 concepts. |
| Dataset Splits | Yes | These hyperparameters are chosen on a held-out set of 4 concepts to achieve 90% of the maximum possible train CLIP image score for the least number of iterations. |
| Hardware Specification | Yes | Each experiment used a single NVIDIA A100 80GB GPU. |
| Software Dependencies | No | The paper mentions using the 'Diffusers library' and 'Stable Diffusion v1.5' but does not provide specific version numbers for these software dependencies (e.g., 'Diffusers 0.10.0' or 'PyTorch 1.9'). While 'v1.5' is a model version, it lacks broader software environment versioning. |
| Experiment Setup | Yes | In our experiments, we found {N = 310, α = 0.15} for Textual Inversion, {N = 440, α = 0.4} for Dream Booth, and {N = 180, α = 0.15} for Custom Diffusion to work relatively well across all concepts we evaluated. |