Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Variational Pólya Tree

Authors: Lu Xu, Tsai Hor Chan, Lequan Yu, Kwok Lam, Guosheng Yin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the model performance on both real data and images, and demonstrate its competitiveness with other state-of-the-art deep density estimation methods. We also explore its ability in enhancing interpretability and uncertainty quantification. ... We validate VPT across diverse tasks, including high-dimensional tabular data and image density estimation using normalizing flow architectures. ... Empirical evaluations on various datasets demonstrate that our VPT prior can achieve superior performance in density estimation compared to existing methods.
Researcher Affiliation Academia Lu Xu The University of Hong Kong EMAIL Tsai Hor Chan University of Pennsylvania Tsaihor.Chan@Penn Medicine.upenn.edu Kwok Fai Lam Hong Kong Metropolitan University EMAIL Lequan Yu The University of Hong Kong EMAIL Guosheng Yin The University of Hong Kong EMAIL
Pseudocode Yes Algorithm 1 Training Procedure of Variational Pólya Tree (VPT) 1: Input: Data points {xi}N i=1, dimension D, tree level L, number of epochs nepoch, learning rate η. 2: Compute total number of nodes per dimension: nnodes = 2L 1 3: Initialize Beta parameters for all nodes: αϵ1:j 10 = 1, αϵ1:j 11 = 1 4: for epoch = 1, . . . , nepoch do 5: for each dimension d = 1, . . . , D do 6: for each node j = 1, . . . , nnodes do 7: Sample split: 8: Y (d) ϵ1:j 10|x Beta α(d) ϵ1:j 10, α(d) ϵ1:j 11 9: Compute partition intervals B(d) ϵ1:j. 10: end for 11: end for 12: Compute joint posterior in Eq. (1), 13: p({βj}L j=1, YL | x), and 14: LVPT = log p({βj}L j=1, YL | x) 15: Update αϵ1:j 10, αϵ1:j 11. 16: end for
Open Source Code Yes Code is available at https://github.com/howardchanth/var-polya-tree.
Open Datasets Yes We first illustrate the capability of our VPT prior using three common synthetic datasets, a ring of 8 Gaussians, two interwoven spirals, and a checkerboard pattern. ... We perform density estimation on five tabular UCI datasets, POWER, GAS, HEPMASS, MINIBOONE, and BSDS300. ... We further test our method on two image datasets: MNIST and CIFAR-10 [24]. ... In addition to the experiments in Section 4.3, we also conduct experiments on the SVHN dataset.
Dataset Splits Yes We follow the preprocessing procedure outlined in [39]. ... For this experiment, we employ a Block-NAF [6] as our feature learning architecture... The model is trained using the Adam optimizer... All models are trained until convergence, with a maximum of 1, 000 epochs, stopping if there is no improvement on the validation set for 100 epochs. ... We employ a classic flow-based network NICE [8] as our feature learning backbone, and we use the same settings as in the original paper.
Hardware Specification Yes All experiments are conducted on a single RTX-3090 GPU.
Software Dependencies No The VPT is implemented with PyTorch.
Experiment Setup Yes For this experiment, we employ a Block-NAF [6] as our feature learning architecture, which has fewer parameters compared to the original neural autoregressive flow [18]. Consistent with the Block-NAF methodology, we train 5 stacked flows, each with 2 layers and 20D hidden units, where D represents the input dimension. The model is trained using the Adam optimizer, with a learning rate of 10 2 for the Block-NAF flow and 0.1 for the variational Pólya tree. All models are trained until convergence, with a maximum of 1, 000 epochs, stopping if there is no improvement on the validation set for 100 epochs. ... The architecture consists of a stack of four coupling layers, with a diagonal positive scaling for the last stage. Each coupling function follows the same architecture: five hidden layers of 1,000 units for MNIST, and four layers of 2,000 units for SVHN and CIFAR-10. The NICE models are trained with Adam with learning rate 10 3, momentum 0.9, β2 = 0.01, λ = 1 and ϵ = 10 4. Our VPT models are trained with Adam with learning rate 0.5.