Distribution Compression in Near-Linear Time
Authors: Abhishek Shetty, Raaz Dwivedi, Lester Mackey
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time. |
| Researcher Affiliation | Collaboration | 1 Department of EECS, UC Berkeley 2 Department of Computer Science, Harvard University and Department of EECS, MIT 3 Microsoft Research New England |
| Pseudocode | Yes | Algorithm 1: COMPRESS |
| Open Source Code | Yes | See the goodpoints Python package for Python implementations of all methods in this paper and https://github.com/microsoft/goodpoints for code reproducing each experiment. |
| Open Datasets | Yes | For the i.i.d. targets, we report MMDk(P, Pout) which can be exactly computed in closed-form. For the MCMC targets, we report the thinning error MMDk(Pin, Pout) analyzed directly by our theory (Thms. 2 and 4). ... we adopt the four posterior targets of Riabiz et al. (2020a) based on the Goodwin (1965) model of oscillatory enzymatic control (d = 4), the Lotka (1925); Volterra (1926) model of oscillatory predator-prey evolution (d = 4), the Hinch et al. (2004) model of calcium signalling in cardiac cells (d = 38), and a tempered Hinch model posterior (d = 38). |
| Dataset Splits | No | The paper describes compressing input point sequences and evaluating the MMD error and runtime of the resulting coresets. It does not mention or specify traditional train/validation/test dataset splits, as its focus is on distribution compression rather than supervised model training. |
| Hardware Specification | Yes | All runtimes were measured on a single core of an Intel Xeon CPU. |
| Software Dependencies | No | The paper mentions 'goodpoints Python package' and 'Python implementations' but does not specify exact version numbers for Python or any other software libraries. |
| Experiment Setup | Yes | For COMPRESS++, we use g = 4 throughout to satisfy the small relative error criterion (11) in all experiments. ... In all experiments involving kernel thinning, we set the algorithm failure probability parameter δ = 1/2 ... Throughout we use a Gaussian kernel k(x, y) = exp( 1/(2σ^2) ||x - y||^2 ) with σ^2 as specified by Dwivedi & Mackey (2021, Sec. K.2) for the MCMC targets and σ^2 = 2d otherwise. |