Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models.

Authors: Athanasios Tragakis, Marco Aversa, Chaitanya Kaul, Roderick Murray-Smith, Daniele Faccio

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments Pixelsmith is tested on a single RTX 3090 GPU, with all tested resolutions requiring 8.4 GB of memory. Performance is evaluated on the LAION-5B dataset Schuhmann et al. [2022] by randomly sampling 1,000 image and text prompt pairs. The metrics used for evaluation are Fréchet Inception Distance (FID) Heusel et al. [2017], Kernel Inception Distance (KID) Bi nkowski et al. [2018], Inception Score (IS) Salimans et al. [2016], and CLIP Score Radford et al. [2021], with the FID metric computed using the clean-FID approach Parmar et al. [2022] (for further comparisons, see Appendix B).
Researcher Affiliation Collaboration Athanasios Tragakis University of Glasgow Glasgow, United Kingdom a.tragakis.1@research.gla.ac.uk Marco Aversa Dotphoton Zug, Switzerland marco.aversa@dotphoton.com Chaitanya Kaul University of Glasgow Glasgow, United Kingdom chaitanya.kaul@glasgow.ac.uk Roderick Murray-Smith University of Glasgow Glasgow, United Kingdom Roderick.Murray-Smith@glasgow.ac.uk Daniele Faccio University of Glasgow Glasgow, United Kingdom Daniele.Faccio@glasgow.ac.uk
Pseudocode No The paper describes the framework and processes but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes The code for our work is available at https://thanos-db.github.io/Pixelsmith/.
Open Datasets Yes Performance is evaluated on the LAION-5B dataset Schuhmann et al. [2022] by randomly sampling 1,000 image and text prompt pairs.
Dataset Splits No The paper states it evaluates performance on the LAION-5B dataset by randomly sampling 1,000 image and text prompt pairs, but it does not explicitly specify training, validation, or test dataset splits or percentages.
Hardware Specification Yes Pixelsmith is tested on a single RTX 3090 GPU, with all tested resolutions requiring 8.4 GB of memory.
Software Dependencies No The paper discusses various models and frameworks used or compared against (e.g., SDXL, DALL-E) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We conducted a quantitative examination of our framework at a resolution of 2048 2048 pixels, focusing on key factors that influence its performance: the Slider position, the role of amplitude and phase in the latent space, the importance of masking during guidance, and the impact of averaging overlapping patches.