Entropy Coding of Unordered Data Structures

Authors: Julius Kunze, Daniel Severo, Giulio Zani, Jan-Willem van de Meent, James Townsend

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We release an implementation that can easily be adapted to different data types and statistical models, and demonstrate that our implementation achieves state-of-the-art compression rates on a range of graph datasets including molecular data. ... To demonstrate the method experimentally, we first applied it to the TUDatasets graphs (Morris et al., 2020), with a very simple Erd os-Rényi G(n, p) model for P. Table 2 shows a summary, highlighting the significance of the discount achieved by shuffle coding. ... We also compared directly to Bouritsas et al. (2021), who used a more sophisticated neural method to compress graphs (upper part of Table 3).
Researcher Affiliation Academia Julius Kunze University College London juliuskunze@gmail.com Daniel Severo University of Toronto and Vector Institute d.severo@mail.utoronto.ca Giulio Zani University of Amsterdam g.zani@uva.nl Jan-Willem van de Meent University of Amsterdam j.w.vandemeent@uva.nl James Townsend University of Amsterdam j.h.n.townsend@uva.nl
Pseudocode Yes The following is an implementation of shuffle coding, showing, on the right, the effect of the steps on message length. 1 def encode(m, f): Effect on message length: ... 8 def decode(m): ...
Open Source Code Yes We release source code1 with straightforward interfaces to enable future applications of shuffle coding with more sophisticated models and to classes of unordered objects other than graphs. 1Source code, data and results are available at https://github.com/juliuskunze/shuffle-coding.
Open Datasets Yes To demonstrate the method experimentally, we first applied it to the TUDatasets graphs (Morris et al., 2020)...
Dataset Splits No Because Pn C requires training, it was evaluated on a random test subset of each dataset, whereas shuffle coding was evaluated on entire datasets.
Hardware Specification Yes All shuffle coding speeds are for a single thread on a Mac Book Pro 2018 with a 2.7GHz Intel Core i7 CPU.
Software Dependencies No For graphs, the canonical orderings we use are computed using the nauty and Traces libraries (Mc Kay and Piperno, 2014). ... Routines for constructing and working with stabilizer chains are standard in computational group theory, and are implemented in Sym Py (https://www.sympy.org/), as well as the GAP system (https://www.gap-system.org/)...
Experiment Setup No The paper describes the statistical models used (Erd os-Rényi G(n,p), Pólya urn) and how empirical probability vectors are computed, but does not provide details like learning rates, batch sizes, or epoch counts typically associated with experimental setups for learned models.