Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Embracing the chaos: analysis and diagnosis of numerical instability in variational flows

Authors: Zuheng Xu, Trevor Campbell

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first empirically demonstrate that common flows can exhibit a catastrophic accumulation of error: the numerical flow map deviates significantly from the exact map which affects sampling and the numerical inverse flow map does not accurately recover the initial input which affects density and ELBO computations. ... Finally, we develop and empirically test a diagnostic procedure that can be used to validate results produced by numerically unstable flows in practice. ... In this section, we verify our error bounds and diagnostic procedure of Mix Flow on the banana, cross, and 2 real data targets a Bayesian linear regression and logistic problems.
Researcher Affiliation	Academia	Zuheng Xu Trevor Campbell Department of Statistics University of British Columbia [zuheng.xu \| trevor]@stat.ubc.ca
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. The methods are described in text and through mathematical equations.
Open Source Code	No	The paper does not explicitly state that code for their methodology is released or provide a link to it. It references third-party code like 'Julia package Advanced HMC.jl [55]' which is used in the experiments, but not their own implementation of the proposed methods.
Open Datasets	Yes	For this problem, we use the Oxford Parkinson s Disease Telemonitoring Dataset [52]. ... We use a bank marketing dataset [53] downsampled to 400 data points.
Dataset Splits	No	The paper describes the datasets and mentions subsampling for two real-world datasets, but it does not specify explicit train/validation/test splits (percentages or sample counts) or reference standard splits for their particular use.
Hardware Specification	Yes	All experiments were conducted on a machine with an AMD Ryzen 9 3900X and 32G of RAM.
Software Dependencies	No	The paper mentions 'directly calling the eigmin function provided in Julia' and 'the Julia package Advanced HMC.jl [55]', but it does not provide specific version numbers for Julia or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	In terms of the settings of leapfrog integrators for each target, we used 200 leapfrog steps of size 0.02 and 60 leapfrog steps of size 0.005 for the banana and cross target distributions respectively, and used 40 leapfrog steps of size 0.0006 and 50 leapfrog steps of size 0.002 for the linear regression and logistic regression examples, respectively. The reference distribution q0 for each target is chosen to be the mean-field Gaussian approximation as used in [5].