Sliced Kernelized Stein Discrepancy

Authors: Wenbo Gong, Yingzhen Li, José Miguel Hernández-Lobato

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental extensive experiments show the proposed discrepancy significantly outperforms KSD and various baselines in high dimensions. For model learning, we show its advantages over existing Stein discrepancy baselines by training independent component analysis models with different discrepancies. We further propose a novel particle inference method called sliced Stein variational gradient descent (S-SVGD) which alleviates the mode-collapse issue of SVGD in training variational autoencoders.
Researcher Affiliation Collaboration Wenbo Gong University of Cambridge wg242@cam.ac.uk Yingzhen Li Imperial College London yingzhen.li@imperial.ac.uk José Miguel Hernández-Lobato University of Cambridge The Alan Turing Institute jmh233@cam.ac.uk Work done at Microsoft Research Cambridge
Pseudocode Yes Algorithm 1: GOF Test with max SKSD U-statistics
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes Gaussian GOF benchmarks (Jitkrittum et al., 2017; Huggins & Mackey, 2018; Chwialkowski et al., 2016). RBM (Liu et al., 2016; Huggins & Mackey, 2018; Jitkrittum et al., 2017). binarized MNIST. UCI datasets (Dua & Graff, 2017).
Dataset Splits No No explicit percentages, sample counts, or detailed methodology for train/validation/test splits are provided across all experiments. For binarized MNIST, it mentions 'first 5,000 test images' but no details on training/validation splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Adam' as an optimizer but does not provide version numbers for any software dependencies or libraries.
Experiment Setup Yes For the testing setup, we set the significance level α = 0.05. For ICA, 'data sampled from a randomly initialized ICA model'. For VAEs, 'The decoder is trained as in vanilla VAEs, but the encoder is trained by amortization'. It also states, 'For fair comparisons, we do not tune the coefficient of the repulsive force.' However, detailed hyperparameters like learning rates, batch sizes, or specific training schedules are not explicitly provided in the main text.