Scaling Continuous Latent Variable Models as Probabilistic Integral Circuits

Authors: Gennaro Gala, Cassio P. de Campos, Antonio Vergari, Erik Quaeghebeur

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive experiments, we showcase the effectiveness of functional sharing and the superiority of QPCs over traditional PCs. and In our experiments, we first benchmark the effectiveness of functional sharing for scaling the training of PICs via numerical quadrature, comparing it with standard PCs and PICs w/o functional sharing [18]. Then, following prior work [10, 32, 33, 18], we compare QPCs and PCs as distribution estimators on several image datasets.
Researcher Affiliation Academia 1Eindhoven University of Technology, NL 2School of Informatics, University of Edinburgh, UK
Pseudocode Yes Algorithm 1 RG2PIC(R) and Algorithm 2 merge(u1, u2, ρ) and Algorithm 3 PIC2QPC(c, z, ew)
Open Source Code Yes Our code is available at github.com/gengala/tenpics.
Open Datasets Yes We begin with the MNIST-family, which includes 6 datasets of gray-scale 28x28 images: MNIST [29], FASHIONMNIST [55], and EMNIST with its 4 splits [7]. Then, we move to larger RGB image datasets as CIFAR [28], Image Net32, Image Net64 [14], and Celeb A [34].
Dataset Splits No For each dataset, we perform a training cycle of T optimization steps, after which we perform a validation step and stop training if the validation log-likelihood did not improve by δ nats after 5 training cycles. The paper mentions a validation step but does not explicitly provide the specific percentages, sample counts, or methodology for creating the validation split.
Hardware Specification Yes We use an NVIDIA A100 40GB throughout our experiments.
Software Dependencies No The paper provides PyTorch code examples but does not explicitly state the version numbers for PyTorch, Python, CUDA, or any other key software dependencies used in the experiments.
Experiment Setup Yes We use Adam [23] and a batch size of 256 for all experiments. After some preliminary runs, we found that a learning rate of 5e-3 worked best, which we annealed towards 10e-4 using cosine annealing with warm restarts across 500 optimization steps [38]. We also apply weight decay with λ = 0.01. and After some preliminary runs, we found that a constant learning rate of 0.01 worked best for all PC models, and for all datasets. We keep the PC parameters unnormalized, and, as such, we clamp them to a small positive value (10^-19) after each Adam update to keep them non-negative, and subtract the log normalization constant to normalize the log-likelihoods.