Learning deep kernels for exponential family densities

Authors: Li Wenliang, Danica J. Sutherland, Heiko Strathmann, Arthur Gretton

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental in empirical studies, deep maximum-likelihood models can yield higher likelihoods, while our approach gives better estimates of the gradient of the log density, the score, which describes the distribution s shape.
Researcher Affiliation Academia 1Gatsby Computational Neuroscience Unit, University College London, London, U.K..
Pseudocode Yes Algorithm 1: Full training procedure input: Dataset D; initial inducing points z, kernel parameters w, regularization λ = (λα, λC) Split D into D1 and D2; Optimize w, λ, z, and maybe q0 params: while ˆJ( pkw α(λ,kw,z,D1),z, D2) still improving do Sample disjoint data subsets Dt, Dv D1; f( ) = PM m=1 αm(λ, kw, z, Dt)kw(zm, ); ˆJ = 1 |Dv| P|Dv| n=1 PD d=1 2 df(xn) + 1 2( df(xn))2 ; Take SGD step in ˆJ for w, λ, z, maybe q0 params; end Optimize λ for fitting on larger batches: while ˆJ( pkw α(λ,kw,z,D1),z, D2) still improving do f( ) = PM m=1 αm(λ, kw, z, D1)kw( , zm); Sample subset Dv D2; ˆJ = 1 |Dv| P|Dv| n=1 PD d=1 2 df(xn) + 1 2( df(xn))2 ; Take SGD steps in ˆJ for λ only; end Finalize α on D1: Find α = α(λ, kw, z, D1); return: log p( ) = PM m=1 αmkw( , zm) + log q0( );
Open Source Code Yes Code for DKEF is at github.com/kevin-w-li/deep-kexpfam.
Open Datasets Yes we trained DKEF and the likelihoodbased models on five UCI datasets (Dheeru & Karra Taniskidou, 2017); in particular, we used Red Wine, White Wine, Parkinson, Hep Mass, and Mini Boone.
Dataset Splits Yes Split D into D1 and D2; Optimize w, λ, z, and maybe q0 params: while ˆJ( pkw α(λ,kw,z,D1),z, D2) still improving do Sample disjoint data subsets Dt, Dv D1;. Also, from Section 3: 'We can avoid this problem and additionally find the best values for the regularization weights λ with a form of meta-learning. We find choices for the kernel and regularization which will give us a good value of ˆJ on a validation set Dv when fit to a fresh training set Dt.'
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) were listed. The paper mentions 'TensorFlow operations' but does not specify a version.
Experiment Setup Yes For the models above, we use layers of width 30 for experiments on synthetic data, and 100 for benchmark datasets. Larger values did not improve performance. ... DKEF. On synthetic datasets, we consider four variants of our model with one kernel component, R = 1. ... DKEF-G-15 has the kernel (7), with L = 3 layers of width W = 15. DKEF-G-50 is the same with W = 50. ... In all experiments, q0(x) = QD d=1 exp |xd µd|βd/(2σ2 d) , with βd > 1. On benchmark datasets, we use DKEF-G-50 and KEF-G with three kernel components, R = 3. ... Note that we trained DKEF while adding Gaussian noise with standard deviation 0.05 to the (whitened) dataset