Debiased Distribution Compression
Authors: Lingxiao Li, Raaz Dwivedi, Lester Mackey
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, our techniques provide succinct and accurate posterior summaries while overcoming biases due to burn-in, approximate Markov chain Monte Carlo, and tempering. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2Cornell Tech 3Microsoft Research New England. |
| Pseudocode | Yes | Algorithm 1 Stein Kernel Thinning (SKT) |
| Open Source Code | Yes | Our open-source code is available as part of the Good Points Python library at https://github.com/microsoft/goodpoints. |
| Open Datasets | Yes | To evaluate this protocol, we compress a Bayesian logistic regression posterior conditioned on the Forest Covtype dataset (d=54) using n=224 approximate MCMC points from the stochastic gradient Fisher scoring sampler (Ahn et al., 2012) with batch size 32. Following Wang et al. (2024), we set M = 2log p(xmode) at the sample mode xmode and use 220 surrogate ground truth points from the No U-turn Sampler (Hoffman and Gelman, 2014) to evaluate energy distance. We find that our proposals improve upon standard thinning and Stein thinning for each compression task, not just in the optimized MMD metric (Fig. 2, top) but also in the auxiliary energy distance (Fig. 2, middle) and when measuring integration error for the mean (Fig. I.4). To test this proposal, we compress the cardiac calcium signaling model posterior (d = 38) of Riabiz et al. (2022, Sec. 4.3) with M = I and n = 3 106 tempered points from a Gaussian random walk Metropolis-Hastings chain. |
| Dataset Splits | No | The paper describes experiments involving compressing input point sequences but does not mention standard training, validation, or test dataset splits typically used for model evaluation and reproduction. |
| Hardware Specification | Yes | Each experiment was run with a single NVIDIA RTX 6000 GPU and an AMD EPYC 7513 32-Core CPU. |
| Software Dependencies | No | We implement our algorithms in JAX (Bradbury et al., 2018) and refer the reader to App. I for additional experiment details (including runtime comparison in Tab. I.1). Our open-source code is available as part of the Good Points Python library at https://github.com/microsoft/goodpoints. To generate the surrogate ground truth using NUTS, we used numpyro (Phan et al., 2019). The paper names JAX and numpyro but does not specify their version numbers. |
| Experiment Setup | Yes | For LD, we always use Q = 3. To ensure that the guarantees of Lem. F.3 and Thm. 4 hold while achieving fast convergence in practice, we take the step size of AMD to be 1/(8 k P n) in the first adaptive round and 1/(8 P i [n] w(q 1) i k P(xi, xi)) in subsequent adaptive rounds. We use T = 7 n0 for AMD in all experiments. ... For Compress++, we use g = 4 in all experiments as in Shetty et al. (2022). For both Kernel Thinning and KT-Compress++, we use choose δ = 1/2 as in the goodpoints library. |