Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Invertible Hierarchical Generative Model for Images

Authors: Heikki Timonen, Miika Aittala, Jaakko Lehtinen

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We observe a significant increase in sample quality when compared with deep normalizing flow models, decreasing the Fréchet Inception Distance (FID) metric (Heusel et al., 2017) of Glow 51.5 to 27.3 on the Celeb A-HQ-dataset (Karras et al., 2017) at 256 256 resolution, using a significantly smaller model with a much shorter training time. We demonstrate ability to control individual levels of detail via the latent decomposition of our model. Project source code is available at https://github.com/timoneh/hflow. We also present ablation studies, showing how different architectural choices within our model change its behavior and performance in terms of FID. Finally, we train baseline Glow-like flow-models with similar capacity to ours using the church and bedroom classes of the LSUNdataset (Yu et al., 2015).
Researcher Affiliation Collaboration Heikki Timonen EMAIL Department of Computer Science, Aalto University Miika Aittala NVIDIA Jaakko Lehtinen Department of Computer Science, Aalto University NVIDIA
Pseudocode Yes Algorithm 1 Inference and sampling for the model in Figure 2 Require: Data point x, noise scale α procedure Inference y Encoder(x) σ σX Defined in Eq. 10 ε N(0, α2σ2I) y y + ε zprior f 1 prior(y) zcond f 1 cond (x; Decoder(y)) z [zprior, zcond] return z end procedure
Open Source Code Yes Project source code is available at https://github.com/timoneh/hflow.
Open Datasets Yes decreasing the Fréchet Inception Distance (FID) metric (Heusel et al., 2017) of Glow 51.5 to 27.3 on the Celeb A-HQ-dataset (Karras et al., 2017) at 256 256 resolution We train our model also using the LSUN churches and bedrooms datasets at 128 128 resolution. ...LSUNdataset (Yu et al., 2015). Figure 11: Uncurated samples from model with Config A trained with FFHQ 256 256.
Dataset Splits No The paper mentions using specific datasets (Celeb A-HQ, LSUN, FFHQ) and describes data augmentation techniques like adding uniform noise and random horizontal flips. It also states the number of samples used for FID calculation (25k or 30k) and that
Hardware Specification Yes Furthermore, we measure a throughput of around 50 samples / second on an NVIDIA RTX 3090 GPU, which is around 4 times the throughput of Glow on the same hardware. # GPUs 2 V100 16 GB # GPUs 1 V100 16 GB
Software Dependencies No The paper mentions using
Experiment Setup Yes Table 2: Training details for Config A with Celeb A-HQ/FFHQ 256 256 Name Values Batch size 16 Batch size Var(X) 4 Optimizer Adam (Kingma & Ba, 2014) with β1 = 0.9, β2 = 0.999 LR (encoders/decoders/flows) 5 10 4, 2 10 3, 5 10 3 LR decay Multiplicative (encoders / decoders+flows) 0.92/0.95 Encoder parameter freeze at 60 epochs Gradient L2 clipping 50.0 # GPUs 2 V100 16 GB Train time 96 h Total parameter count 80.15 M Data Preprocessing We augment each dataset by adding uniform 1/255 noise to 8-bit images normalized to [0, 1] on top of which we also add slight zero-mean Gaussian noise with standard deviation 5 10 3. During training, we apply random horizontal flips with probability p = 0.5.