Entropy Coding of Unordered Data Structures
Authors: Julius Kunze, Daniel Severo, Giulio Zani, Jan-Willem van de Meent, James Townsend
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We release an implementation that can easily be adapted to different data types and statistical models, and demonstrate that our implementation achieves state-of-the-art compression rates on a range of graph datasets including molecular data. ... To demonstrate the method experimentally, we first applied it to the TUDatasets graphs (Morris et al., 2020), with a very simple Erd os-Rényi G(n, p) model for P. Table 2 shows a summary, highlighting the significance of the discount achieved by shuffle coding. ... We also compared directly to Bouritsas et al. (2021), who used a more sophisticated neural method to compress graphs (upper part of Table 3). |
| Researcher Affiliation | Academia | Julius Kunze University College London juliuskunze@gmail.com Daniel Severo University of Toronto and Vector Institute d.severo@mail.utoronto.ca Giulio Zani University of Amsterdam g.zani@uva.nl Jan-Willem van de Meent University of Amsterdam j.w.vandemeent@uva.nl James Townsend University of Amsterdam j.h.n.townsend@uva.nl |
| Pseudocode | Yes | The following is an implementation of shuffle coding, showing, on the right, the effect of the steps on message length. 1 def encode(m, f): Effect on message length: ... 8 def decode(m): ... |
| Open Source Code | Yes | We release source code1 with straightforward interfaces to enable future applications of shuffle coding with more sophisticated models and to classes of unordered objects other than graphs. 1Source code, data and results are available at https://github.com/juliuskunze/shuffle-coding. |
| Open Datasets | Yes | To demonstrate the method experimentally, we first applied it to the TUDatasets graphs (Morris et al., 2020)... |
| Dataset Splits | No | Because Pn C requires training, it was evaluated on a random test subset of each dataset, whereas shuffle coding was evaluated on entire datasets. |
| Hardware Specification | Yes | All shuffle coding speeds are for a single thread on a Mac Book Pro 2018 with a 2.7GHz Intel Core i7 CPU. |
| Software Dependencies | No | For graphs, the canonical orderings we use are computed using the nauty and Traces libraries (Mc Kay and Piperno, 2014). ... Routines for constructing and working with stabilizer chains are standard in computational group theory, and are implemented in Sym Py (https://www.sympy.org/), as well as the GAP system (https://www.gap-system.org/)... |
| Experiment Setup | No | The paper describes the statistical models used (Erd os-Rényi G(n,p), Pólya urn) and how empirical probability vectors are computed, but does not provide details like learning rates, batch sizes, or epoch counts typically associated with experimental setups for learned models. |