CLIPZyme: Reaction-Conditioned Virtual Screening of Enzymes
Authors: Peter Mikhael, Itamar Chinn, Regina Barzilay
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a contrastive objective, we train CLIPZyme to encode and align representations of enzyme structures and reaction pairs. With no standard computational baseline, we compare CLIPZyme to existing EC (enzyme commission) predictors applied to virtual enzyme screening and show improved performance in scenarios where limited information on the reaction is available (BEDROC85 of 44.69%). Additionally, we evaluate combining EC predictors with CLIPZyme and show its generalization capacity on both unseen reactions and protein clusters. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, U.S.A. |
| Pseudocode | No | The paper describes implementation details in Appendices C and D but does not provide any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We make our code and data available at https://github.com/pgmikhael/clipzyme. |
| Open Datasets | Yes | Our method is developed on the Enzyme Map dataset (Heid et al., 2023), which includes biochemical reactions linked with associated Uni Prot IDs and their respective EC numbers. ... We downloaded Enzyme Map from https://github. com/hesther/enzymemap, and the Terpene Synthase data from https://zenodo.org/records/ 10359046. |
| Dataset Splits | Yes | We divide our dataset into training, development, and testing groups based on these reaction rules (Figure 2). ... Table 10. Statistics of the Enzyme Map dataset used to develop CLIPZyme after pre-processing. TRAINING SPLIT DEVELOPMENT SPLIT TEST SPLIT NUMBER OF SAMPLES 34,427 7,287 4,642 NUMBER OF REACTIONS 12,629 2,669 1,554 NUMBER OF PROTEINS 9,794 1,964 1,407 NUMBER OF ECS 2,251 465 319 |
| Hardware Specification | Yes | We train all models on 8 NVIDIA A6000 GPUs. |
| Software Dependencies | Yes | All models are developed in Py Torch v2.0.1 (Paszke et al., 2019) and trained using Py Torch Lightning v2.0.9 (Falcon & The Py Torch Lightning team, 2019). |
| Experiment Setup | Yes | All models are trained with a batch size of 64 with bfloat16 precision and trained until convergence (approximately 30 epochs). We use a learning rate of 1e 4 with a cosine learning rate schedule and 100 steps of linear warm-up. Warm-up starts with a learning rate of 1e 6, and the minimum learning rate after warm-up is set to 1e 5. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with a weight decay of 0.05 and (β1, β2) = (0.9, 0.999). |