Practical Shuffle Coding

Authors: Julius Kunze, Daniel Severo, Jan-Willem van de Meent, James Townsend

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our high-performance implementation matches the optimal compression rates from Severo et al. (2023a) for multisets, but is orders of magnitude faster.
Researcher Affiliation Academia Julius Kunze University College London juliuskunze@gmail.com Daniel Severo University of Toronto and Vector Institute d.severo@mail.utoronto.ca Jan-Willem van de Meent University of Amsterdam j.w.vandemeent@uva.nl James Townsend University of Amsterdam j.h.n.townsend@uva.nl
Pseudocode Yes 1 def encode(m, f): Effect on message length: 2 if i=0: return m ... 12 def decode(m): 13 if i=0: return m, f0 ...
Open Source Code Yes We release source code1 which can be extended easily to new models and classes of unordered objects other than multisets and graphs. 1Source code, data and results are available at https://github.com/juliuskunze/shuffle-coding.
Open Datasets Yes We applied incomplete joint and autoregressive shuffle coding with color refinement according to Definition C.1 to various graph datasets. We use the simple Erd os-Rényi (ER) G(n, p) model for P, which is straightforward to convert into an autoregressive model P(fi | f[i]). Kunze et al. (2024) observe that the Pólya urn (PU) preferential attachment model proposed by Severo et al. (2023b) drastically improves the compression rate on SZIP graphs from Choi and Szpankowski (2012). ... In Tables 8 and 9, we compare complete and incomplete joint shuffle coding on the TU (Morris et al., 2020) and SZIP datasets.
Dataset Splits No The paper does not explicitly provide details about a validation dataset split (e.g., percentages, sample counts, or specific pre-defined splits) separate from training and testing.
Hardware Specification Yes All shuffle coding speeds in this paper were measured on a Mac Book Pro 2018 with a 2.7GHz Intel Core i7 CPU.
Software Dependencies No The paper mentions various software components and libraries, such as 'nauty and Traces library' and 'weighted AVL tree', but does not provide specific version numbers for these ancillary software dependencies required for replication.
Experiment Setup Yes We observe that incomplete shuffle coding leads to dramatic speedups of up to a factor of one million on some of these graphs, with a minimal increase in compression rate across all datasets. ... For incomplete autoregressive graph shuffle coding, we apply chunking to the prefixing chain from Example 4.2. We first evaluate the effect of chunk size in Appendix I on SZIP graphs, and find that c = 16 uniformly sized chunks lead to a good balance between runtime and rate, which we will use in all following experiments.