Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Invertible Hierarchical Generative Model for Images

Authors: Heikki Timonen, Miika Aittala, Jaakko Lehtinen

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We observe a significant increase in sample quality when compared with deep normalizing flow models, decreasing the Fréchet Inception Distance (FID) metric (Heusel et al., 2017) of Glow 51.5 to 27.3 on the Celeb A-HQ-dataset (Karras et al., 2017) at 256 256 resolution, using a significantly smaller model with a much shorter training time. We demonstrate ability to control individual levels of detail via the latent decomposition of our model. Project source code is available at https://github.com/timoneh/hflow. We also present ablation studies, showing how different architectural choices within our model change its behavior and performance in terms of FID. Finally, we train baseline Glow-like flow-models with similar capacity to ours using the church and bedroom classes of the LSUNdataset (Yu et al., 2015).
Researcher Affiliation	Collaboration	Heikki Timonen EMAIL Department of Computer Science, Aalto University Miika Aittala NVIDIA Jaakko Lehtinen Department of Computer Science, Aalto University NVIDIA
Pseudocode	Yes	Algorithm 1 Inference and sampling for the model in Figure 2 Require: Data point x, noise scale α procedure Inference y Encoder(x) σ σX Defined in Eq. 10 ε N(0, α2σ2I) y y + ε zprior f 1 prior(y) zcond f 1 cond (x; Decoder(y)) z [zprior, zcond] return z end procedure
Open Source Code	Yes	Project source code is available at https://github.com/timoneh/hflow.
Open Datasets	Yes	decreasing the Fréchet Inception Distance (FID) metric (Heusel et al., 2017) of Glow 51.5 to 27.3 on the Celeb A-HQ-dataset (Karras et al., 2017) at 256 256 resolution We train our model also using the LSUN churches and bedrooms datasets at 128 128 resolution. ...LSUNdataset (Yu et al., 2015). Figure 11: Uncurated samples from model with Config A trained with FFHQ 256 256.
Dataset Splits	No	The paper mentions using specific datasets (Celeb A-HQ, LSUN, FFHQ) and describes data augmentation techniques like adding uniform noise and random horizontal flips. It also states the number of samples used for FID calculation (25k or 30k) and that
Hardware Specification	Yes	Furthermore, we measure a throughput of around 50 samples / second on an NVIDIA RTX 3090 GPU, which is around 4 times the throughput of Glow on the same hardware. # GPUs 2 V100 16 GB # GPUs 1 V100 16 GB
Software Dependencies	No	The paper mentions using
Experiment Setup	Yes	Table 2: Training details for Config A with Celeb A-HQ/FFHQ 256 256 Name Values Batch size 16 Batch size Var(X) 4 Optimizer Adam (Kingma & Ba, 2014) with β1 = 0.9, β2 = 0.999 LR (encoders/decoders/flows) 5 10 4, 2 10 3, 5 10 3 LR decay Multiplicative (encoders / decoders+flows) 0.92/0.95 Encoder parameter freeze at 60 epochs Gradient L2 clipping 50.0 # GPUs 2 V100 16 GB Train time 96 h Total parameter count 80.15 M Data Preprocessing We augment each dataset by adding uniform 1/255 noise to 8-bit images normalized to [0, 1] on top of which we also add slight zero-mean Gaussian noise with standard deviation 5 10 3. During training, we apply random horizontal flips with probability p = 0.5.