Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Subtractive Mixture Models via Squaring: Representation and Learning
Authors: Lorenzo Loconte, Aleksanteri Mikulus Sladek, Stefan Mengel, Martin Trapp, Arno Solin, Nicolas Gillis, Antonio Vergari
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, iv) we provide empirical evidence (Sec. 5) that NPC2s can approximate distributions better than monotonic PCs for a variety of experimental settings involving learning from real-world data and distilling intractable models such as large language models to unlock tractable inference (Zhang et al., 2023). |
| Researcher Affiliation | Collaboration | Lorenzo Loconte1 Aleksanteri M. Sladek2 Stefan Mengel3 Martin Trapp2 Arno Solin2 Nicolas Gillis4 Antonio Vergari1 1 School of Informatics, University of Edinburgh, UK 2 Department of Computer Science, Aalto University, Finland 3 University of Artois, CNRS, Centre de Recherche en Informatique de Lens (CRIL), France 4 Department of Mathematics and Operational Research, Universit e de Mons, Belgium |
| Pseudocode | Yes | Algorithm 1 square Tensorized Circuit(ℓ, R) |
| Open Source Code | Yes | The source code, documentation, data sets and scripts needed to reproduce the results and figures, are available at https://github.com/april-tools/squared-npcs. |
| Open Datasets | Yes | In Sec. 5 we evaluate NPC2s for density estimation on five multivariate UCI data sets (Dua & Graff, 2017): Power (Hebrail & Berard, 2012), Gas (Fonollosa et al., 2015), Hepmass (Baldi et al., 2016), Mini Boo NE (Roe et al., 2004) and BSDS300 patches (Martin et al., 2001) by following the pre-processing by Papamakarios et al. (2017). |
| Dataset Splits | Yes | Given p (x) the distribution modeled by GPT2 over sentences x = [x1, . . . , x D] having maximum length D, we aim to minimize the Kullback-Leibler divergence KL[p | p], where p is modeled by a PC. Minimizing such divergence is equivalent to learn the PC by maximum-likelihood on data sampled by GPT2. Therefore, following the experimental setting by Zhang et al. (2023) we sample a data set of 8M sentences using GPT2 having bounded length D = 32, i.e., with a maximum of D = 32 tokens. Then, we split such sentences into training, validation and test set having proportions 0.85/0.05/0.10, respectively. |
| Hardware Specification | Yes | The benchmarks mentioned above and illustrated in Figs. C.1 to C.3 have been run on a single NVIDIA RTX A6000 with 48Gi B of memory. |
| Software Dependencies | No | The paper does not explicitly state specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | All models are learned by batched stochastic gradient descent using the Adam optimizer with default learning rate (Kingma & Ba, 2015) and a batch size of 256. The parameters of all mixtures are initialized by sampling uniformly between 0 and 1. Furthermore, monotonicity in (squared) PCs is ensured by exponentiating the parameters. |