reproducibilityindex.ai

CLIPZyme: Reaction-Conditioned Virtual Screening of Enzymes

Authors: Peter Mikhael, Itamar Chinn, Regina Barzilay

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a contrastive objective, we train CLIPZyme to encode and align representations of enzyme structures and reaction pairs. With no standard computational baseline, we compare CLIPZyme to existing EC (enzyme commission) predictors applied to virtual enzyme screening and show improved performance in scenarios where limited information on the reaction is available (BEDROC85 of 44.69%). Additionally, we evaluate combining EC predictors with CLIPZyme and show its generalization capacity on both unseen reactions and protein clusters.
Researcher Affiliation	Academia	1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, U.S.A.
Pseudocode	No	The paper describes implementation details in Appendices C and D but does not provide any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	We make our code and data available at https://github.com/pgmikhael/clipzyme.
Open Datasets	Yes	Our method is developed on the Enzyme Map dataset (Heid et al., 2023), which includes biochemical reactions linked with associated Uni Prot IDs and their respective EC numbers. ... We downloaded Enzyme Map from https://github. com/hesther/enzymemap, and the Terpene Synthase data from https://zenodo.org/records/ 10359046.
Dataset Splits	Yes	We divide our dataset into training, development, and testing groups based on these reaction rules (Figure 2). ... Table 10. Statistics of the Enzyme Map dataset used to develop CLIPZyme after pre-processing. TRAINING SPLIT DEVELOPMENT SPLIT TEST SPLIT NUMBER OF SAMPLES 34,427 7,287 4,642 NUMBER OF REACTIONS 12,629 2,669 1,554 NUMBER OF PROTEINS 9,794 1,964 1,407 NUMBER OF ECS 2,251 465 319
Hardware Specification	Yes	We train all models on 8 NVIDIA A6000 GPUs.
Software Dependencies	Yes	All models are developed in Py Torch v2.0.1 (Paszke et al., 2019) and trained using Py Torch Lightning v2.0.9 (Falcon & The Py Torch Lightning team, 2019).
Experiment Setup	Yes	All models are trained with a batch size of 64 with bfloat16 precision and trained until convergence (approximately 30 epochs). We use a learning rate of 1e 4 with a cosine learning rate schedule and 100 steps of linear warm-up. Warm-up starts with a learning rate of 1e 6, and the minimum learning rate after warm-up is set to 1e 5. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with a weight decay of 0.05 and (β1, β2) = (0.9, 0.999).