Scaling Tractable Probabilistic Circuits: A Systems Perspective

Authors: Anji Liu, Kareem Ahmed, Guy Van Den Broeck

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, Py Juice can be used to improve state-of-the-art PCs trained on image (e.g., Image Net32) and language (e.g., Wiki Text, Common Gen) datasets. We further establish a new set of baselines on natural image and language datasets by benchmarking existing PC structures but with much larger sizes and more training epochs, with the hope of incentivizing future research.
Researcher Affiliation Academia 1Department of Computer Science, University of California, Los Angeles, USA. Correspondence to: Anji Liu <liuanji@cs.ucla.edu>.
Pseudocode Yes Algorithm 1 Forward pass of a sum layer group; Algorithm 2 Partition a layer into groups; Algorithm 3 Backward pass of a sum layer group w.r.t. parameters; Algorithm 4 Backward pass of a sum layer group w.r.t. inputs
Open Source Code Yes Code is available at https: //github.com/Tractables/pyjuice.
Open Datasets Yes Empirically, Py Juice can be used to improve state-of-the-art PCs trained on image (e.g., Image Net32) and language (e.g., Wiki Text, Common Gen) datasets. ... For the PC trained on the Image Net32 dataset (Deng et al., 2009)...
Dataset Splits Yes Another set of 800K samples is drawn from the fine-tuned GPT as the validation set.
Hardware Specification Yes All experiments were carried out on an RTX 4090 GPU with 24GB memory. ... Experiments are conducted on a server with an AMD EPYC 7763 64-Core Processor and 8 RTX 4090 GPUs (we only use one GPU). ... we compare the runtime of Py Juice with the baselines on an NVIDIA A40 GPU.
Software Dependencies No The paper mentions custom CUDA kernels and uses deep learning frameworks implicitly (e.g., PyTorch given the GPU focus), but it does not provide specific version numbers for any key software libraries or dependencies, which is required for reproducibility.
Experiment Setup Yes To maximize parallelism, we always use the maximum possible batch size. ... we fine-tune the model with an equivalent batch size of 6400 and a step size of 0.01 in the mini-batch EM algorithm. Specifically, suppose θ are the current parameters, θnew are the new set of parameters computed by the EM update. Given step size α, the update formula is θ (1 α)θ + αθnew.