Kernel Stein Discrepancy Descent

Authors: Anna Korba, Pierre-Cyril Aubin-Frankowski, Szymon Majewski, Pierre Ablin

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we discuss the performance of KSD Descent to sample from π in practice, on toy examples and real-world problems. The code to reproduce the experiments and a package to use KSD Descent are available at https: //github.com/pierreablin/ksddescent.
Researcher Affiliation Academia 1CREST, ENSAE, Institut Polytechnique de Paris 2CAS, MINES Paris Tech, Paris, France 3CMAP, Ecole Polytechnique, Institut Polytechnique de Paris 4CNRS and DMA, Ecole Normale Supérieure, Paris, France.
Pseudocode Yes Algorithm 1 KSD Descent GD
Open Source Code Yes The code to reproduce the experiments and a package to use KSD Descent are available at https: //github.com/pierreablin/ksddescent.
Open Datasets No We compare KSD Descent and SVGD in the Bayesian logistic regression setting described in Gershman et al. (2012); Liu & Wang (2016). Given datapoints d1, . . . , dq Rp, and labels y1, . . . , yq { 1}, the labels yi are modelled as p(yi = 1|di, w) = (1 + exp w di ) 1 for some w Rp. The parameters w follow the law p(w|α) = N(0, α 1Ip), and α > 0 is drawn from an exponential law p(α) = Exp(0.01). The parameter vector is then x = [w, log(α)] Rp+1, and we use Algorithm 2 to obtain samples from p(x| (di, yi)q i=1) for 13 datasets, with N = 10 particles for each.
Dataset Splits No No explicit information on dataset splits (e.g., percentages or counts for training, validation, or testing) is provided in the paper.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., exact GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper refers to software packages like NumPy, SciPy, and PyTorch by citing their respective papers in the references, but it does not specify the version numbers of these software dependencies used for the experiments.
Experiment Setup Yes We use N = 10 particles, and take 1000 samples x from the ICA model for p {2, 4, 8}. Each method outputs N estimated unmixing matrices, [ Wi]N i=1. We compute the Amari distance (Amari et al., 1996) between each Wi and W: the Amari distance vanishes if and only if the two matrices are the same up to scale and permutation, which are the natural indeterminacies of ICA. We repeat the experiment 50 times, resulting in 500 values for each algorithm (Figure 5). We also add the results of a random output, where the estimated matrices are obtained with i.i.d. N(0, 1) entries.