Meta-learning Adaptive Deep Kernel Gaussian Processes for Molecular Property Prediction
Authors: Wenlin Chen, Austin Tripp, José Miguel Hernández-Lobato
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the empirical performance of ADKF-IFT from Section 3.4. We choose to focus our experiments exclusively on molecular property prediction and optimization tasks because we believe this application would benefit greatly from better GP models: firstly because many existing methods struggle on small datasets of size 102 which are ubiquitous in chemistry, and secondly because many tasks in chemistry require high-quality uncertainty estimates. First, we evaluate ADKF-IFT on four commonly used benchmark tasks from Molecule Net (Wu et al., 2018), finding that ADKF-IFT achieves state-of-the-art results on most tasks (Section 5.1). Second, we evaluate ADKF-IFT on the larger-scale FS-Mol benchmark (Stanley et al., 2021), finding that ADKF-IFT is the best-performing method (Section 5.2). In particular, our results support the hypothesis from Section 3.4 that ADKF-IFT achieves a better balance between overfitting and underfitting than DKL and DKT. Finally, we show that the ADKF-IFT feature representation is transferable to out-of-domain molecular property prediction and optimization tasks (Section 5.3). |
| Researcher Affiliation | Academia | Wenlin Chen University of Cambridge MPI for Intelligent Systems wc337@cam.ac.uk Austin Tripp University of Cambridge ajt212@cam.ac.uk José Miguel Hernández-Lobato University of Cambridge jmh233@cam.ac.uk |
| Pseudocode | Yes | Algorithm 1 Exact hypergradient computation in ADKF-IFT. 1: Input: a training task T and the current meta-learned parameters ψ meta. 2: Solve Equation (2) to obtain ψ adapt = ψ adapt(ψ meta, ST ). 3: Compute g1 = LV (ψmeta,ψadapt,T ) ψ meta,ψ adapt and g2 = LV (ψadapt,ψadapt,T ) ψ meta,ψ adapt by auto-diff. 4: Compute the Hessian H = 2 LT (ψmeta,ψadapt,ST ) ψadapt ψT adapt ψ meta,ψ adapt by auto-diff. 5: Solve the linear system v H = g2 for v. 6: Compute the mixed partial derivatives P = 2 LT (ψmeta,ψadapt,ST ) ψadapt ψT meta ψ meta,ψ adapt by auto-diff. 7: Output: the hypergradient d LV d ψmeta = g1 v P. Equations (4) and (5) |
| Open Source Code | Yes | Our implementation and experimental results can be found at: https://github.com/ Wenlin-Chen/ADKF-IFT, which is based on a forked from FS-Mol (Stanley et al., 2021) and PAR (Wang et al., 2021). |
| Open Datasets | Yes | First, we evaluate ADKF-IFT on four commonly used benchmark tasks from Molecule Net (Wu et al., 2018), finding that ADKF-IFT achieves state-of-the-art results on most tasks (Section 5.1). Second, we evaluate ADKF-IFT on the larger-scale FS-Mol benchmark (Stanley et al., 2021), finding that ADKF-IFT is the best-performing method (Section 5.2). |
| Dataset Splits | Yes | FS-Mol contains over 5,000 tasks with 233,786 unique compounds from Ch EMBL27 (Mendez et al., 2019), split into training (4,938 tasks), validation (40 tasks), and test (157 tasks) sets. |
| Hardware Specification | Yes | Figure 5 shows the meta-testing costs of all compared meta-learning methods in terms of wall-clock time1 on a pre-defined set of FS-Mol classification tasks. These experiments are run on a single NVIDIA Ge Force RTX 2080 Ti. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies beyond referring to standard optimizers like Adam and L-BFGS, or libraries like RDKit, without specifying their versions. |
| Experiment Setup | Yes | We solve the inner optimization problem (2) using the L-BFGS optimizer (Liu & Nocedal, 1989), since L-BFGS is the default choice for optimizing base kernel parameters in the GP literature. For the outer optimization problem (1), we approximate the expected hypergradient over p(T ) by averaging the hypergradients for a batch of K randomly sampled training tasks at each step, and update the meta-learned parameters ψmeta with the averaged hypergradient using the Adam optimizer (Kingma & Ba, 2014) with learning rate 10 3 for Molecule Net and 10 4 for FS-Mol. We set K = 10 for Molecule Net and K = 16 for FS-Mol. For all experiments on FS-Mol, we evaluate the performance of our model on a small set of validation tasks during meta-training and use early stopping (Prechelt, 1998) to avoid overfitting of ψmeta. We use zero mean function and set Matérn52 without automatic relevance determination (ARD) (Neal, 1996) as the base kernel in ADKF-IFT, since the typical sizes of the support sets in few-shot learning are too small to adjust a relatively large number of ARD lengthscales in ADKF-IFT. The lengthscale in the base kernel of ADKF-IFT is initialized using the median heuristic (Garreau et al., 2017) for each task, with a log-normal prior centered at the initialization. Following Patacchiola et al. (2020), we treat binary classification as 1 label regression for ADKF-IFT. |