Multi-Scale Representation Learning on Proteins

Authors: Vignesh Ram Somnath, Charlotte Bunne, Andreas Krause

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the learned representation on different tasks, (i.) ligand binding affinity (regression), and (ii.) protein function prediction (classification). On the regression task, contrary to previous methods, our model performs consistently and reliably across different dataset splits, outperforming all baselines on most splits. On the classification task, it achieves a performance close to the top-performing model while using 10x fewer parameters.
Researcher Affiliation Academia Vignesh Ram Somnath Dept. of Computer Science ETH Zurich vsomnath@ethz.ch Charlotte Bunne Dept. of Computer Science ETH Zurich bunnec@ethz.ch Andreas Krause Dept. of Computer Science ETH Zurich krausea@ethz.ch
Pseudocode No The paper describes the architecture and processes in text format and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about the release of its own source code or a link to a repository.
Open Datasets Yes Dataset. The PDBBIND database (version 2019) [Liu et al., 2017] is a collection of the experimentally measured binding affinity data for all types of biomolecular complexes deposited in the Protein Data Bank [Berman et al., 2000].
Dataset Splits Yes We split the dataset into training, test and validation splits based on the scaffolds of the corresponding ligands (scaffold), or a 30% and a 60% sequence identity threshold (identity 30%, identity 60%) to limit homologous ligands or proteins appearing in both train and test sets.
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions various software components and models (e.g., MSMS, MPN, MLP, GCN), but does not provide specific version numbers for any of them.
Experiment Setup No The paper describes the model architecture and general components (e.g., K iterations of message passing) but does not provide specific hyperparameter values like learning rates, batch sizes, or optimizer settings.