reproducibilityindex.ai

Generalized Kernel Thinning

Authors: Raaz Dwivedi, Lester Mackey

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments with target KT and KT+, we witness significant improvements in integration error even in 100 dimensions and when compressing challenging differential equation posteriors. ... In Sec. 4, we use our new tools to generate substantially compressed representations of both i.i.d. samples in dimensions d = 2 through 100 and Markov chain Monte Carlo samples targeting challenging differential equation posteriors. In line with our theory, we find that TARGET KT and KT+ significantly improve both single function integration error and MMD, even for kernels without fast-decaying square-roots.
Researcher Affiliation	Collaboration	Raaz Dwivedi1, Lester Mackey2 1 Department of Computer Science, Harvard University and Department of EECS, MIT 2 Microsoft Research New England raaz@mit.edu, lmackey@microsoft.com
Pseudocode	Yes	Algorithm 1: Generalized Kernel Thinning... Algorithm 1a: KT-SPLIT... Algorithm 1b: KT-SWAP
Open Source Code	Yes	See App. I for supplementary experimental details and results and the goodpoints Python package https://github.com/microsoft/goodpoints for Python code reproducing all experiments.
Open Datasets	Yes	We consider three classes of target distributions on Rd: (i) mixture of Gaussians P = 1 M PM j=1 N(µj, I2)... (ii) Gaussian P = N(0, Id)... and (iii) the posteriors of four distinct coupled ordinary differential equation models: the Goodwin (1965) model..., the Lotka (1925) model..., the Hinch et al. (2004) model.... For settings (i) and (ii), we use an i.i.d. input sequence Sin from P... For setting (iii), we use MCMC input sequences Sin from 12 posterior inference experiments of Riabiz et al. (2020a)...
Dataset Splits	No	The paper does not explicitly describe dataset splits (e.g., train/validation/test percentages or counts) in the conventional machine learning sense for model training and evaluation. It describes how input data (Sin) is generated or obtained and how the output coreset (SKT) is evaluated against the target distribution or input sequence.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions using a 'goodpoints Python package' but does not specify any software names with version numbers (e.g., Python version, specific library versions like PyTorch, TensorFlow, etc.).
Experiment Setup	Yes	We focus on coresets of size n produced from n inputs with δi = 1 2n, let Pout denote the empirical distribution of each output coreset, and report mean error ( 1 standard error) over 10 independent replicates of each experiment. ... For settings (i) and (ii), we use an i.i.d. input sequence Sin from P and kernel bandwidths σ = 1/γ = 2d. For setting (iii), we use MCMC input sequences Sin from 12 posterior inference experiments of Riabiz et al. (2020a) and set the bandwidths σ = 1/γ as specified by Dwivedi & Mackey (2021, Sec. K.2).