reproducibilityindex.ai

Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding

Authors: Yangjun Ruan, Karen Ullrich, Daniel S Severo, James Townsend, Ashish Khisti, Arnaud Doucet, Alireza Makhzani, Chris Maddison

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate improved lossless compression rates in a variety of settings, especially in out-of-distribution or sequential data compression. We test our methods in various lossless compression settings, including compression of natural images and musical pieces using deep latent variable models. We report between 2% 19% rate savings in our experiments, and we see our most signiﬁcant improvements when compressing out-of-distribution data or sequential data.
Researcher Affiliation	Collaboration	1University of Toronto 2Vector Institute 3Facebook AI Research 4University College London 5University of Oxford.
Pseudocode	Yes	Appendix A contains pseudocode algorithms, such as "Algorithm 1: Extended Latent Space Representation of Importance Sampling", "Algorithm 2: Encoding with BB-CIS", "Algorithm 3: Decoding with BB-CIS".
Open Source Code	Yes	Our implementation is available at https: //github.com/ryoungj/mcbits.
Open Datasets	Yes	We benchmarked the performance of BB-IS and BB-CIS on the standard train-test splits of two datasets: an alphanumeric extension of MNIST called EMNIST (Cohen et al., 2017), and CIFAR-10 (Krizhevsky, 2009). We quantiﬁed the performance of BB-SMC on sequential data compression tasks with 4 polyphonic music datasets: Nottingham, JSB, Muse Data, and Piano-midi.de (Boulanger-Lewandowski et al., 2012).
Dataset Splits	Yes	We benchmarked the performance of BB-IS and BB-CIS on the standard train-test splits of two datasets: an alphanumeric extension of MNIST called EMNIST (Cohen et al., 2017), and CIFAR-10 (Krizhevsky, 2009).
Hardware Specification	Yes	The experiment was run on a Tesla P100 GPU with 12GB of memory, together with an Intel Xeon Silver 4110 CPU at 2.10GHz.
Software Dependencies	No	The paper mentions using the "JAX framework" but does not provide specific version numbers for JAX or any other software dependencies.
Experiment Setup	Yes	Many of our experiments used continuous latent variable models and we adopted the maximum entropy quantization in Townsend et al. (2019) to discretize the latents. For each dataset, 3 VRNN models were trained with the ELBO, IWAE and FIVO objectives with 4 particles and were used with their corresponding coders for compression. The number of optimization steps was set to 50 and this method is denoted as BB-ELBO-IF (50).