Learning deep kernels for exponential family densities
Authors: Li Wenliang, Danica J. Sutherland, Heiko Strathmann, Arthur Gretton
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | in empirical studies, deep maximum-likelihood models can yield higher likelihoods, while our approach gives better estimates of the gradient of the log density, the score, which describes the distribution s shape. |
| Researcher Affiliation | Academia | 1Gatsby Computational Neuroscience Unit, University College London, London, U.K.. |
| Pseudocode | Yes | Algorithm 1: Full training procedure input: Dataset D; initial inducing points z, kernel parameters w, regularization λ = (λα, λC) Split D into D1 and D2; Optimize w, λ, z, and maybe q0 params: while ˆJ( pkw α(λ,kw,z,D1),z, D2) still improving do Sample disjoint data subsets Dt, Dv D1; f( ) = PM m=1 αm(λ, kw, z, Dt)kw(zm, ); ˆJ = 1 |Dv| P|Dv| n=1 PD d=1 2 df(xn) + 1 2( df(xn))2 ; Take SGD step in ˆJ for w, λ, z, maybe q0 params; end Optimize λ for fitting on larger batches: while ˆJ( pkw α(λ,kw,z,D1),z, D2) still improving do f( ) = PM m=1 αm(λ, kw, z, D1)kw( , zm); Sample subset Dv D2; ˆJ = 1 |Dv| P|Dv| n=1 PD d=1 2 df(xn) + 1 2( df(xn))2 ; Take SGD steps in ˆJ for λ only; end Finalize α on D1: Find α = α(λ, kw, z, D1); return: log p( ) = PM m=1 αmkw( , zm) + log q0( ); |
| Open Source Code | Yes | Code for DKEF is at github.com/kevin-w-li/deep-kexpfam. |
| Open Datasets | Yes | we trained DKEF and the likelihoodbased models on five UCI datasets (Dheeru & Karra Taniskidou, 2017); in particular, we used Red Wine, White Wine, Parkinson, Hep Mass, and Mini Boone. |
| Dataset Splits | Yes | Split D into D1 and D2; Optimize w, λ, z, and maybe q0 params: while ˆJ( pkw α(λ,kw,z,D1),z, D2) still improving do Sample disjoint data subsets Dt, Dv D1;. Also, from Section 3: 'We can avoid this problem and additionally find the best values for the regularization weights λ with a form of meta-learning. We find choices for the kernel and regularization which will give us a good value of ˆJ on a validation set Dv when fit to a fresh training set Dt.' |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4) were listed. The paper mentions 'TensorFlow operations' but does not specify a version. |
| Experiment Setup | Yes | For the models above, we use layers of width 30 for experiments on synthetic data, and 100 for benchmark datasets. Larger values did not improve performance. ... DKEF. On synthetic datasets, we consider four variants of our model with one kernel component, R = 1. ... DKEF-G-15 has the kernel (7), with L = 3 layers of width W = 15. DKEF-G-50 is the same with W = 50. ... In all experiments, q0(x) = QD d=1 exp |xd µd|βd/(2σ2 d) , with βd > 1. On benchmark datasets, we use DKEF-G-50 and KEF-G with three kernel components, R = 3. ... Note that we trained DKEF while adding Gaussian noise with standard deviation 0.05 to the (whitened) dataset |