Amortized Variational Deep Kernel Learning
Authors: Alan L. S. Matias, César Lincoln Mattos, João Paulo Pordeus Gomes, Diego Mesquita
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our resulting method, amortized varitional DKL (AVDKL), i) consistently outperforms DKL and standard GPs for tabular data; ii) achieves significantly higher accuracy than DKL in node classification tasks; and iii) leads to substantially better accuracy and negative loglikelihood than DKL on CIFAR100. |
| Researcher Affiliation | Academia | 1Federal University of Cear a 2Getulio Vargas Foundation. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The full code for AVDKL and the experiments is available at https://github.com/ alanlsmatias/amortized-variational-dkl. |
| Open Datasets | Yes | We evaluate the proposed AVDKL on a set of different tasks: tabular data, semi-supervised node classification and image classification. We start our experiments with popular classification and regression datasets from the UCI repository (Kelly et al., 2023). For the graph node classification task, we used three popular citation-network datasets: Cora, Cite Seer and Pub Med. We train all models from scratch on CIFAR10 and CIFAR100 datasets. |
| Dataset Splits | Yes | We split the datasets into five different train/test sets, with each split having 80% of the data for training and 20% for testing. We consider the fixed splits defined by Yang et al. (2016), with 20 nodes per class for training, 500 nodes for validation and 1000 nodes for testing. |
| Hardware Specification | Yes | Furthermore, all models were trained on a NVIDIA Ge Force RTX 3060 with 12GB, and 16GB of RAM. |
| Software Dependencies | No | All models were implemented using Py Torch, Py Torch Geometric (Fey & Lenssen, 2019), GPy Torch (Gardner et al., 2018) and Neural Tangents (Novak et al., 2020; 2022; Han et al., 2022; Hron et al., 2020; Sohl-Dickstein et al., 2020). While specific libraries are mentioned, explicit version numbers for these software components (e.g., PyTorch 1.x.y) are not provided in the text. |
| Experiment Setup | Yes | The optimization was carried out using Adam W (Loshchilov & Hutter, 2019) with learning rate 0.01 and a weight decay of 0.001 for the NN s weights and biases. We also used a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2017). For all NN-based models, AVDKL, GDKL, DLVKL and SVDKL, we used a NN with two layers of size [64, D], where the first layer includes batch normalization and a Si LU activation. Finally, we trained all models for 200 epochs using Adam W optimizer with learning rate 0.005 (0.01 for the SVGP) and weight decay 0.001 (applied to AVDKL, GDKL, DLVKL, and SVDKL networks). |