reproducibilityindex.ai

Preferential Normalizing Flows

Authors: Petrus Mikkola, Luigi Acerbi, Arto Klami

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce a method for eliciting the expert s belief density as a normalizing flow based solely on preferential questions such as comparing or ranking alternatives. This allows eliciting in principle arbitrarily flexible densities, but flow estimation is susceptible to the challenge of collapsing or diverging probability mass that makes it difficult in practice. We tackle this problem by introducing a novel functional prior for the flow, motivated by a decision-theoretic argument, and show empirically that the belief density can be inferred as the function-space maximum a posteriori estimate. We demonstrate our method by eliciting multivariate belief densities of simulated experts, including the prior belief of a general-purpose large language model over a real-world dataset.
Researcher Affiliation	Academia	Petrus Mikkola, Luigi Acerbi , Arto Klami Department of Computer Science, University of Helsinki first.last@helsinki.fi
Pseudocode	Yes	Algorithm 1 Full algorithm require: preferential data Dfull while not converged do sample mini-batch D Dfull ϕ ϕFS-Posterior(ϕ\|D) end while; Algorithm 2 FS-Posterior(ϕ\|D) require: precision s input: flow parameters ϕ, mini-batch D X = design matrix of D X = winner points of X loglik = P log L(D \| fϕ(X), s) logprior = P fϕ(X ) return: loglik + logprior
Open Source Code	Yes	Code for reproducing all experiments is available at https://github.com/petrus-mikkola/prefflow.
Open Datasets	Yes	We first fit a flow model to the continuous covariates of the regression data abalone [Nash et al., 1995], and then use the fitted flow as a ground-truth belief density in the elicitation experiment. ... California housing dataset [Pace and Barry, 1997]
Dataset Splits	No	The paper describes how it generates data (synthetic or LLM queries) for learning the flow but does not specify explicit training, validation, or test dataset splits of this collected data in a conventional sense for model development.
Hardware Specification	Yes	Models are trained and evaluated on a server with nodes of two Intel Xeon processors, code name Cascade Lake, with 20 cores each running at 2.1 GHz.
Software Dependencies	No	The paper mentions using Real NVP, Neural Spline Flow, PyTorch (via normflows package), and Adamax optimizer, but it does not specify explicit version numbers for these software components.
Experiment Setup	Yes	In all the experiments, we use the value s = 1 in the preferential likelihood regardless of how misspecified it is with respect to the ground-truth model. Neural Spline Models have 2 hidden layers and 128 hidden units. The number of flows is 6, 8, or 10 depending on the problem complexity. Real NVP models have 4 hidden layers and 2 hidden units. The number of flows is 36 when the number of rankings is more than 50, and 8 otherwise. Models are trained for a varying number of iterations from 10^5 to 5 * 10^5 with the Adamax optimizer [Kingma and Ba, 2014] and varying batch size from 2 to 8. The learning rate varies from 10^5 to 5 * 10^5 depending on the problem dimensionality, with higher learning rates for higher-dimensional problems. A small weight decay of 10^6 was applied.