One-Line-of-Code Data Mollification Improves Optimization of Likelihood-based Generative Models

Authors: Ba-Hien Tran, Giulio Franzese, Pietro Michiardi, Maurizio Filippone

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report results on real-world image data sets and UCI benchmarks with popular likelihood-based GMs, including variants of variational autoencoders and normalizing flows, showing large improvements in FID score and density estimation. We consider a large set of experiments involving VAEs and NFs and some popular image data sets. These provide a challenging test for likelihood-based GMs due to the large dimensionality of the input space and to the fact that density estimation needs to deal with data lying on manifolds. The results show systematic, and in some cases dramatic, improvements in sample quality, indicating that this is a simple and effective strategy to improve optimization of likelihood-based GMs models.
Researcher Affiliation Academia Ba-Hien Tran Department of Data Science EURECOM, France ba-hien.tran@eurecom.fr Giulio Franzese Department of Data Science EURECOM, France giulio.franzese@eurecom.fr Pietro Michiardi Department of Data Science EURECOM, France pietro.michiardi@eurecom.fr Maurizio Filippone Department of Data Science EURECOM, France maurizio.filippone@eurecom.fr
Pseudocode Yes Algorithm 1: Gaussian mollification Algorithm 2: Python code for blurring mollification Algorithm 3: Python code for noise schedules
Open Source Code No The paper does not provide a direct link or explicit statement about the availability of the authors' own source code for the proposed methodology. It mentions using third-party libraries (pythae, normflows, pytorch-fid) and relies on the official implementation of a comparison method [43] but does not provide its own code.
Open Datasets Yes We consider two image data sets including CIFAR10 [40] and CELEBA 64 [41] datasets... We consider four data sets in the UCI repository [17]: RED-WINE, WHITE-WINE, PARKINSONS, and MINIBOONE.
Dataset Splits Yes We use the official train/val/test splits for both data sets. ... 10% of the data is set aside as a test set, and an additional 10% of the remaining data is used for validation.
Hardware Specification Yes We use NVIDIA P100 and A100 GPUs for the experiments, with 16GB and 80GB of memory respectively. All models are trained on a single GPU except for the experiments with NVAE model [78], where we employ two A100 GPUs.
Software Dependencies No The paper mentions software like PyTorch [54], pythae [6], normflows [71], and pytorch-fid [footnote 1], but it does not specify explicit version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes In the experiments on synthetic data sets, we use a REAL-NVP flow [14] with 5 affine coupling layers consisting of 2 hidden layers of 64 units each. We train the model for 20000 itereations using an Adam optimizer [33] with a learning rate of 5 10 4 and a mini-batch size of 256. ... We use an Adam optimizer [33] with a learning rate of 10 3 and a mini-batch size of 64. We train the model for 100 and 80 epochs on the CIFAR10 and CELEBA data sets, respectively. ... We use a multi-scale architecture as described in [34]. The architecture has a depth level of K = 20, and a number of levels L = 3. We use the Ada Max [33] optimizer with a learning rate of 3 10 4 and a mini-batch size of 64. ... We use an Adam optimizer [33] with a learning rate of 3 10 4 and a mini-batch size of 128. We train the model for 200 and 100 epochs on the CIFAR10 and CELEBA data sets, respectively. ... All models are trained with the Adam optimizer [33] for 150 epochs with a learning rate of 10 4 and a mini-batch size of 100.