reproducibilityindex.ai

LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Authors: Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley, Otmar Hilliges, Romann M Weber

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experimentation, we show that Lite VAE considerably reduces the computational cost of the standard VAE encoder while maintaining the same level of reconstruction quality.Our base Lite VAE model matches the quality of the established VAEs in current LDMs with a six-fold reduction in encoder parameters, leading to faster training and lower GPU memory requirements, while our larger model outperforms VAEs of comparable complexity across all evaluated metrics (r FID, LPIPS, PSNR, and SSIM).
Researcher Affiliation	Collaboration	1ETH Zürich, 2Disney Research\|Studios
Pseudocode	Yes	G Pseudocode for different Lite VAE blocks
Open Source Code	No	While we do not provide open access to our codebase, the hyperparameters, algorithms, and implementation details are provided in the appendix to ensure reproducibility.
Open Datasets	Yes	FFHQ [30] (256 256), Image Net [57]
Dataset Splits	No	The paper uses standard public datasets like FFHQ and ImageNet but does not explicitly detail training, validation, and test splits with percentages or sample counts.
Hardware Specification	Yes	The values are measured on one Quadro RTX 6000.
Software Dependencies	No	Our implementation of the UNet used for feature extraction and aggregation closely follows the ADM model [10] without spatial down/upsampling layers. We use Adam optimizer [34] with a learning rate of 10 4 and (β1, β2) = (0.5, 0.9). (Specific version numbers for software libraries are not provided).
Experiment Setup	Yes	All models were trained with a batch size of 16 on two GPUs until the autoencoder could produce high-quality reconstructions. The training duration was 200k steps for the Image Net 128 128 models, and 100k for the Image Net 256 256 and FFHQ models. We use Adam optimizer [34] with a learning rate of 10 4 and (β1, β2) = (0.5, 0.9).