reproducibilityindex.ai

Practical Shuffle Coding

Authors: Julius Kunze, Daniel Severo, Jan-Willem van de Meent, James Townsend

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our high-performance implementation matches the optimal compression rates from Severo et al. (2023a) for multisets, but is orders of magnitude faster.
Researcher Affiliation	Academia	Julius Kunze University College London juliuskunze@gmail.com Daniel Severo University of Toronto and Vector Institute d.severo@mail.utoronto.ca Jan-Willem van de Meent University of Amsterdam j.w.vandemeent@uva.nl James Townsend University of Amsterdam j.h.n.townsend@uva.nl
Pseudocode	Yes	1 def encode(m, f): Effect on message length: 2 if i=0: return m ... 12 def decode(m): 13 if i=0: return m, f0 ...
Open Source Code	Yes	We release source code1 which can be extended easily to new models and classes of unordered objects other than multisets and graphs. 1Source code, data and results are available at https://github.com/juliuskunze/shuffle-coding.
Open Datasets	Yes	We applied incomplete joint and autoregressive shuffle coding with color refinement according to Definition C.1 to various graph datasets. We use the simple Erd os-Rényi (ER) G(n, p) model for P, which is straightforward to convert into an autoregressive model P(fi \| f[i]). Kunze et al. (2024) observe that the Pólya urn (PU) preferential attachment model proposed by Severo et al. (2023b) drastically improves the compression rate on SZIP graphs from Choi and Szpankowski (2012). ... In Tables 8 and 9, we compare complete and incomplete joint shuffle coding on the TU (Morris et al., 2020) and SZIP datasets.
Dataset Splits	No	The paper does not explicitly provide details about a validation dataset split (e.g., percentages, sample counts, or specific pre-defined splits) separate from training and testing.
Hardware Specification	Yes	All shuffle coding speeds in this paper were measured on a Mac Book Pro 2018 with a 2.7GHz Intel Core i7 CPU.
Software Dependencies	No	The paper mentions various software components and libraries, such as 'nauty and Traces library' and 'weighted AVL tree', but does not provide specific version numbers for these ancillary software dependencies required for replication.
Experiment Setup	Yes	We observe that incomplete shuffle coding leads to dramatic speedups of up to a factor of one million on some of these graphs, with a minimal increase in compression rate across all datasets. ... For incomplete autoregressive graph shuffle coding, we apply chunking to the prefixing chain from Example 4.2. We first evaluate the effect of chunk size in Appendix I on SZIP graphs, and find that c = 16 uniformly sized chunks lead to a good balance between runtime and rate, which we will use in all following experiments.