Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models.
Authors: Athanasios Tragakis, Marco Aversa, Chaitanya Kaul, Roderick Murray-Smith, Daniele Faccio
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments Pixelsmith is tested on a single RTX 3090 GPU, with all tested resolutions requiring 8.4 GB of memory. Performance is evaluated on the LAION-5B dataset Schuhmann et al. [2022] by randomly sampling 1,000 image and text prompt pairs. The metrics used for evaluation are Fréchet Inception Distance (FID) Heusel et al. [2017], Kernel Inception Distance (KID) Bi nkowski et al. [2018], Inception Score (IS) Salimans et al. [2016], and CLIP Score Radford et al. [2021], with the FID metric computed using the clean-FID approach Parmar et al. [2022] (for further comparisons, see Appendix B). |
| Researcher Affiliation | Collaboration | Athanasios Tragakis University of Glasgow Glasgow, United Kingdom a.tragakis.1@research.gla.ac.uk Marco Aversa Dotphoton Zug, Switzerland marco.aversa@dotphoton.com Chaitanya Kaul University of Glasgow Glasgow, United Kingdom chaitanya.kaul@glasgow.ac.uk Roderick Murray-Smith University of Glasgow Glasgow, United Kingdom Roderick.Murray-Smith@glasgow.ac.uk Daniele Faccio University of Glasgow Glasgow, United Kingdom Daniele.Faccio@glasgow.ac.uk |
| Pseudocode | No | The paper describes the framework and processes but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | Yes | The code for our work is available at https://thanos-db.github.io/Pixelsmith/. |
| Open Datasets | Yes | Performance is evaluated on the LAION-5B dataset Schuhmann et al. [2022] by randomly sampling 1,000 image and text prompt pairs. |
| Dataset Splits | No | The paper states it evaluates performance on the LAION-5B dataset by randomly sampling 1,000 image and text prompt pairs, but it does not explicitly specify training, validation, or test dataset splits or percentages. |
| Hardware Specification | Yes | Pixelsmith is tested on a single RTX 3090 GPU, with all tested resolutions requiring 8.4 GB of memory. |
| Software Dependencies | No | The paper discusses various models and frameworks used or compared against (e.g., SDXL, DALL-E) but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We conducted a quantitative examination of our framework at a resolution of 2048 2048 pixels, focusing on key factors that influence its performance: the Slider position, the role of amplitude and phase in the latent space, the importance of masking during guidance, and the impact of averaging overlapping patches. |