Scaling Tractable Probabilistic Circuits: A Systems Perspective
Authors: Anji Liu, Kareem Ahmed, Guy Van Den Broeck
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, Py Juice can be used to improve state-of-the-art PCs trained on image (e.g., Image Net32) and language (e.g., Wiki Text, Common Gen) datasets. We further establish a new set of baselines on natural image and language datasets by benchmarking existing PC structures but with much larger sizes and more training epochs, with the hope of incentivizing future research. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of California, Los Angeles, USA. Correspondence to: Anji Liu <liuanji@cs.ucla.edu>. |
| Pseudocode | Yes | Algorithm 1 Forward pass of a sum layer group; Algorithm 2 Partition a layer into groups; Algorithm 3 Backward pass of a sum layer group w.r.t. parameters; Algorithm 4 Backward pass of a sum layer group w.r.t. inputs |
| Open Source Code | Yes | Code is available at https: //github.com/Tractables/pyjuice. |
| Open Datasets | Yes | Empirically, Py Juice can be used to improve state-of-the-art PCs trained on image (e.g., Image Net32) and language (e.g., Wiki Text, Common Gen) datasets. ... For the PC trained on the Image Net32 dataset (Deng et al., 2009)... |
| Dataset Splits | Yes | Another set of 800K samples is drawn from the fine-tuned GPT as the validation set. |
| Hardware Specification | Yes | All experiments were carried out on an RTX 4090 GPU with 24GB memory. ... Experiments are conducted on a server with an AMD EPYC 7763 64-Core Processor and 8 RTX 4090 GPUs (we only use one GPU). ... we compare the runtime of Py Juice with the baselines on an NVIDIA A40 GPU. |
| Software Dependencies | No | The paper mentions custom CUDA kernels and uses deep learning frameworks implicitly (e.g., PyTorch given the GPU focus), but it does not provide specific version numbers for any key software libraries or dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | To maximize parallelism, we always use the maximum possible batch size. ... we fine-tune the model with an equivalent batch size of 6400 and a step size of 0.01 in the mini-batch EM algorithm. Specifically, suppose θ are the current parameters, θnew are the new set of parameters computed by the EM update. Given step size α, the update formula is θ (1 α)θ + αθnew. |