Distribution Compression in Near-Linear Time

Authors: Abhishek Shetty, Raaz Dwivedi, Lester Mackey

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time.
Researcher Affiliation Collaboration 1 Department of EECS, UC Berkeley 2 Department of Computer Science, Harvard University and Department of EECS, MIT 3 Microsoft Research New England
Pseudocode Yes Algorithm 1: COMPRESS
Open Source Code Yes See the goodpoints Python package for Python implementations of all methods in this paper and https://github.com/microsoft/goodpoints for code reproducing each experiment.
Open Datasets Yes For the i.i.d. targets, we report MMDk(P, Pout) which can be exactly computed in closed-form. For the MCMC targets, we report the thinning error MMDk(Pin, Pout) analyzed directly by our theory (Thms. 2 and 4). ... we adopt the four posterior targets of Riabiz et al. (2020a) based on the Goodwin (1965) model of oscillatory enzymatic control (d = 4), the Lotka (1925); Volterra (1926) model of oscillatory predator-prey evolution (d = 4), the Hinch et al. (2004) model of calcium signalling in cardiac cells (d = 38), and a tempered Hinch model posterior (d = 38).
Dataset Splits No The paper describes compressing input point sequences and evaluating the MMD error and runtime of the resulting coresets. It does not mention or specify traditional train/validation/test dataset splits, as its focus is on distribution compression rather than supervised model training.
Hardware Specification Yes All runtimes were measured on a single core of an Intel Xeon CPU.
Software Dependencies No The paper mentions 'goodpoints Python package' and 'Python implementations' but does not specify exact version numbers for Python or any other software libraries.
Experiment Setup Yes For COMPRESS++, we use g = 4 throughout to satisfy the small relative error criterion (11) in all experiments. ... In all experiments involving kernel thinning, we set the algorithm failure probability parameter δ = 1/2 ... Throughout we use a Gaussian kernel k(x, y) = exp( 1/(2σ^2) ||x - y||^2 ) with σ^2 as specified by Dwivedi & Mackey (2021, Sec. K.2) for the MCMC targets and σ^2 = 2d otherwise.