Context-enriched molecule representations improve few-shot drug discovery

Authors: Johannes Schimunek, Philipp Seidl, Lukas Friedrich, Daniel Kuhn, Friedrich Rippmann, Sepp Hochreiter, Günter Klambauer

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our approach with other few-shot methods for drug discovery on the FS-Mol benchmark dataset. On FS-Mol, our approach outperforms all compared methods and therefore sets a new state-of-the art for few-shot learning in drug discovery. An ablation study shows that the enrichment step of our method is the key to improve the predictive quality. In a domain shift experiment, we further demonstrate the robustness of our method.
Researcher Affiliation Collaboration 1 ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria schimunek@ml.jku.at 2 Computational Chemistry & Biologics, Merck Healthcare, Darmstadt, Germany
Pseudocode No The paper describes the model architecture and its components but does not include a formal pseudocode or algorithm block.
Open Source Code Yes Code is available at https://github.com/ml-jku/MHNfs.
Open Datasets Yes Recently, the dataset FS-Mol (Stanley et al., 2021) was proposed to benchmark few-shot learning methods in drug discovery. It was extracted from Ch EMBL27 and comprises in total 489,133 measurements, 233,786 compounds and 5,120 tasks.
Dataset Splits Yes The FS-Mol benchmark dataset defines 4,938 training, 40 validation and 157 test tasks, guaranteeing disjoint task sets.
Hardware Specification Yes Training a single MHNfs model on the benchmarking dataset FSMol takes roughly 90 hours of wall-clock time on an A100 GPU.
Software Dependencies No For the model implementations, we used Py Torch (Paszke et al., 2019, BSD license). We used Py Torch Lightning (Falcon et al., 2019, Apache 2.0 license) as a framework for training and test logic, hydra for config file handling (Yadan, 2019, Apache 2.0 license) and Weights & Biases (Biewald, 2020, MIT license) as an experiment tracking tool.
Experiment Setup Yes The main hyperparameters of our architecture are the number of heads, the embedding dimension, the dimension of the association space of the CAM and CM, the learning rate schedule, the scaling parameter β, and the molecule encoder. The following hyperparameters were selected by manual hyperparameter selection on the validation tasks. The molecule encoder consists of a single layer with output size d = 1024 and SELU activation (Klambauer et al., 2017). The CM consists of one Hopfield layer with 8 heads. The dimension e of the association space is set to 512 and β = 1/ e. Since we use skip connections between all modules the output dimension of the CM and CAM matches the input dimension. The CAM comprises one layer with 8 heads and an association-space dimension of 1088. For the input to the CAM, an activity encoding was added to the support set molecule representations to provide label information. The SM uses τ = 32. For the context set, we randomly sample 5% from a large set of molecules i.e., the molecules in the FS-Mol training split for each batch. For inference, we used a fixed set of 5% of training set molecules as the context set for each seed. We provide considered and selected hyperparameters in Appendix A.1.6.