Wavelet Flow: Fast Training of High Resolution Normalizing Flows

Authors: Jason J. Yu, Konstantinos G. Derpanis, Marcus A. Brubaker

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 Experimental Evaluation
Researcher Affiliation Collaboration Jason J. Yu1,3, Konstantinos G. Derpanis2,4,5 and Marcus A. Brubaker1,3,5 1Department of Electrical Engineering and Computer Science, York University, Toronto 2Department of Computer Science, Ryerson University, Toronto 3Borealis AI, 4Samsung AI Centre Toronto, 5Vector Institute
Pseudocode No The paper describes methods and processes verbally and mathematically (e.g., equations 1 and 2, and descriptions of Wavelet Flow architecture and sampling), but does not include any explicitly labeled 'Algorithm' or 'Pseudocode' blocks.
Open Source Code Yes Code for Wavelet Flow is available at the following project page: https://yorkucvil.github.io/Wavelet-Flow.
Open Datasets Yes To evaluate the performance of Wavelet Flow, we use several standard image datasets to directly compare against the reported results of previous methods. Specifically, we train and evaluate our model on natural image datasets at the commonly used resolutions and follow standard preprocessing: Image Net [38] (32 32 and 64 64) and Large-scale Scene Understanding (LSUN) bedroom, tower, and church outdoor [46] (64 64). We also train on two high resolution datasets at resolutions not previously reported: Celeb Faces Attributes High-Quality (Celeb A-HQ) [21] (1024 1024) and Flickr-Faces-HQ (FFHQ) [22] (1024 1024).
Dataset Splits Yes In cases where overfitting is observed, early stopping is applied based on a held-out validation set. As no standard dataset split is available, we generate our own with 59 000, 4 000, and 7 000 images for training, validation and testing, respectively.
Hardware Specification Yes These implementation choices allow for distributions to be trained using a batch-size of 64 without gradient checkpointing on a single NVIDIA TITAN X (Pascal) GPU.
Software Dependencies No The paper mentions using 'Adamax optimizer [23]' and 'Glow architecture [24]' as components. However, it does not specify software dependencies with version numbers for core libraries or environments (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow and their versions, or CUDA versions).
Experiment Setup Yes Training is done using the same Adamax optimizer [23] as in [24]. These implementation choices allow for distributions to be trained using a batch-size of 64 without gradient checkpointing on a single NVIDIA TITAN X (Pascal) GPU. Hyper-parameters are set to produce models with parameter counts similar to but not exceeding those of Glow [24] to enable a fair comparison.