CLIPZyme: Reaction-Conditioned Virtual Screening of Enzymes

Authors: Peter Mikhael, Itamar Chinn, Regina Barzilay

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through a contrastive objective, we train CLIPZyme to encode and align representations of enzyme structures and reaction pairs. With no standard computational baseline, we compare CLIPZyme to existing EC (enzyme commission) predictors applied to virtual enzyme screening and show improved performance in scenarios where limited information on the reaction is available (BEDROC85 of 44.69%). Additionally, we evaluate combining EC predictors with CLIPZyme and show its generalization capacity on both unseen reactions and protein clusters.
Researcher Affiliation Academia 1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, U.S.A.
Pseudocode No The paper describes implementation details in Appendices C and D but does not provide any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes We make our code and data available at https://github.com/pgmikhael/clipzyme.
Open Datasets Yes Our method is developed on the Enzyme Map dataset (Heid et al., 2023), which includes biochemical reactions linked with associated Uni Prot IDs and their respective EC numbers. ... We downloaded Enzyme Map from https://github. com/hesther/enzymemap, and the Terpene Synthase data from https://zenodo.org/records/ 10359046.
Dataset Splits Yes We divide our dataset into training, development, and testing groups based on these reaction rules (Figure 2). ... Table 10. Statistics of the Enzyme Map dataset used to develop CLIPZyme after pre-processing. TRAINING SPLIT DEVELOPMENT SPLIT TEST SPLIT NUMBER OF SAMPLES 34,427 7,287 4,642 NUMBER OF REACTIONS 12,629 2,669 1,554 NUMBER OF PROTEINS 9,794 1,964 1,407 NUMBER OF ECS 2,251 465 319
Hardware Specification Yes We train all models on 8 NVIDIA A6000 GPUs.
Software Dependencies Yes All models are developed in Py Torch v2.0.1 (Paszke et al., 2019) and trained using Py Torch Lightning v2.0.9 (Falcon & The Py Torch Lightning team, 2019).
Experiment Setup Yes All models are trained with a batch size of 64 with bfloat16 precision and trained until convergence (approximately 30 epochs). We use a learning rate of 1e 4 with a cosine learning rate schedule and 100 steps of linear warm-up. Warm-up starts with a learning rate of 1e 6, and the minimum learning rate after warm-up is set to 1e 5. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with a weight decay of 0.05 and (β1, β2) = (0.9, 0.999).