Lossless Compression with Probabilistic Circuits
Authors: Anji Liu, Stephan Mandt, Guy Van den Broeck
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our PC-based (de)compression algorithm runs 5-40 times faster than neural compression algorithms that achieve similar bitrates. By scaling up the traditional PC structure learning pipeline, we achieve state-of-the-art results on image datasets such as MNIST. |
| Researcher Affiliation | Academia | Anji Liu CS Department UCLA liuanji@cs.ucla.edu Stephan Mandt CS Department University of California, Irvine mandt@uci.edu Guy Van den Broeck CS Department UCLA guyvdb@cs.ucla.edu |
| Pseudocode | Yes | Algorithm 1 Compute F(x) (see Alg. 3 for details) |
| Open Source Code | Yes | Our open-source implementation of the PC-based (de)compression algorithm can be found at https: //github.com/Juice-jl/Pressed Juice.jl. |
| Open Datasets | Yes | Our experiments show that on MNIST and EMNIST, the PC-based compression algorithm achieved So TA bitrates. On more complex data such as subsampled Image Net, we hybridize PCs with normalizing flows and show that PCs can significantly improve the bitrates of the base normalizing flow models. |
| Dataset Splits | No | The paper mentions using datasets like MNIST but does not provide specific training/validation/test split percentages or sample counts, nor does it refer to predefined splits with citations for reproducibility beyond generic dataset names. |
| Hardware Specification | Yes | The compression (resp. decompression) time are the total computation time used to encode (resp. decode) all 10,000 MNIST test samples on a single TITAN RTX GPU. All experiments are performed on a server with 72 CPUs, 512G Memory, and 2 TITAN RTX GPUs. In all experiments, we only use a single GPU on the server. |
| Software Dependencies | No | The paper mentions software like PyTorch, Juice.jl, and rANS, but does not provide specific version numbers for these software dependencies as used in their experiments. |
| Experiment Setup | Yes | For the PCs, we adopted Ei Nets (Peharz et al., 2020a) with hyperparameters K = 12 and R = 4. Instead of using random binary trees to define the model architecture, we used binary trees where closer latent variables in z will be put closer in the binary tree. Parameter learning was performed by the following steps. First, compute the average log-likelihood over a mini-batch of samples. The negative average log-likelihood is the loss we use. Second, compute the gradients w.r.t. all model parameters by backpropagating the loss. Finally, update the IDF and PCs using the gradients individually: for IDF, following Hoogeboom et al. (2019), the Adamax optimizer was used; for PCs, following Peharz et al. (2020a), we use the gradients to compute the EM target of the parameters and performed mini-batch EM updates. |