Continuous Mixtures of Tractable Probabilistic Models

Authors: Alvaro H.C. Correia, Gennaro Gala, Erik Quaeghebeur, Cassio de Campos, Robert Peharz

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we show that this simple scheme proves remarkably effective, as PCs learnt this way set new state of the art for tractable models on many standard density estimation benchmarks.
Researcher Affiliation Academia 1 Eindhoven University of Technology 2 Graz University of Technology
Pseudocode No The paper describes methods and processes but does not include any clearly labeled pseudocode blocks or algorithm sections.
Open Source Code Yes Further experimental details can be found in Appendix A, and our source code is available at github.com/alcorreia/cm-tpm.
Open Datasets Yes We evaluated our method on common benchmarks for generative models, namely 20 standard density estimation datasets (Lowd and Davis 2010; Van Haaren and Davis 2012; Bekker et al. 2015) as well as 4 image datasets (Binary MNIST (Larochelle and Murray 2011), MNIST (Le Cun et al. 1998), Fashion MNIST (Xiao, Rasul, and Vollgraf 2017) and Street View House Numbers (SVHN) (Netzer et al. 2011)).
Dataset Splits Yes We ran cm(SF) and cm(SCLT) and applied LO to both final models for up to 50 epochs, using early stopping on the validation set to avoid overfitting.
Hardware Specification No All models were developed in python 3 with Py Torch (Paszke et al. 2019) and trained with standard commercial GPUs.
Software Dependencies No All models were developed in python 3 with Py Torch (Paszke et al. 2019) and trained with standard commercial GPUs.
Experiment Setup Yes In this set of experiments, we fixed the mixing distribution p(z) to a 4-dimensional standard Gaussian and used N = 210 integration points during training. For the decoder we used 6-layer MLPs with Leaky Re LUs activations. ... We followed the same experimental protocol as in the previous experiments, except that we employed a larger latent dimensionality of 16 and increased the number of integration points during training to 214. We did not use convolutions and stuck to 6-layer MLPs. ... For both MNIST and SVHN data, we used the same architecture and trained cm(SF) models with 16 latent dimensions and K=1 (see Efficient Learning). ... applied LO to both final models for up to 50 epochs, using early stopping on the validation set to avoid overfitting.