Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient CDF Approximations for Normalizing Flows

Authors: Chandramouli Shama Sastry, Andreas Lehrmann, Marcus A Brubaker, Alexander Radovic

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on popular ﬂow architectures and UCI benchmark datasets show a marked improvement in sample eﬃciency as compared to traditional estimators.
Researcher Affiliation	Collaboration	Chandramouli Sastry EMAIL Dalhousie University Vector Institute Borealis AI Andreas M. Lehrmann EMAIL Borealis AI Marcus Brubaker EMAIL York University Vector Institute Borealis AI Alexander Radovic EMAIL Borealis AI
Pseudocode	Yes	See Appendix B for a summary of the entire splitting process in pseudo code.
Open Source Code	Yes	The code to reproduce our results, including training popular normalizing ﬂow architectures, approximating cumulative densities with the proposed adaptive boundary estimator, and other baseline methods is publicly available.1 https://github.com/Borealis AI/nflow-cdf-approximations
Open Datasets	Yes	For the purpose of evaluation, we train normalizing ﬂows on d-dimensional (d {2, 3, 4, 5}) data derived from 4 tabular datasets open sourced as part of the UCI Machine Learning Repository (Dua & Graﬀ, 2017) and preprocessed as in Papamakarios et al. (2017): Power, Gas, Hepmass, and Miniboone.
Dataset Splits	No	The paper mentions obtaining "2 random d-dimensional slices of the dataset over which we train the normalizing ﬂows" and creating "5 convex hulls for each choice of the radius" for evaluation. However, it does not provide specific training, validation, or test dataset splits in terms of percentages, sample counts, or references to standard pre-defined splits for the models being trained or evaluated.
Hardware Specification	Yes	In order to obtain a fair evaluation, we ran all of the timing experiments in a single non-preemptible job having access to 8 CPUs, 64GB RAM and one Tesla T4 GPU (16GB).
Software Dependencies	No	The paper mentions "Py Torch" but does not specify a version number or list any other software dependencies with their corresponding versions.
Experiment Setup	Yes	For all normalizing ﬂows, we train the models with a batch size of 10k and stop when the log-likelihoods do not improve over 5 epochs. For the continuous ﬂows, we used the exact divergence for computing the log-determinant. We used exp-scaling in the aﬃne coupling layer of both MAF and Glow models and, in order to prevent numerical overﬂows, we applied a tanh nonlinearity before the exp-scaling. Finally, we used softplus as our activation function for both the Neural ODE and coupling networks. From Fig. 6 and Fig. 7, we observe both the Continuous and Discrete ﬂows obtain similar log-likelihoods and are able to ﬁt the training data well. For constructing discrete ﬂows, we choose 3, 5 or 7 ﬂow layers and construct coupling layers with 16, 32 or 64 hidden units. While one Glow layer corresponds to a sequence of (Act Norm) (Glow Coupling) (Invertible 1 1) transformations, one MAF layer corresponds to a sequence of (Act Norm) (MAF Coupling) transformations. For continuous ﬂows, we parameterize the neural ODE with 2 hidden layers, each consisting of 16, 32 or 64 hidden units.