Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

Authors: Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our Enzy Gen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity.
Researcher Affiliation Academia 1Language Technologies Institute, Carnegie Mellon University. 2Department of Chemistry, Massachusetts Institute of Technology 3Department of EECS, Massachusetts Institute of Technology. 4Broad Institute of MIT and Harvard. 5Department of Chemistry and Biochemistry, University of California Santa Barbara.
Pseudocode No The paper includes architectural diagrams (Figure 1) but does not provide any pseudocode or algorithm blocks.
Open Source Code Yes The code, model and dataset are released at https: //github.com/Lei Li Lab/Enzy Gen.
Open Datasets Yes We further construct Enzy Bench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). The code, model and dataset are released at https: //github.com/Lei Li Lab/Enzy Gen.
Dataset Splits Yes Then we select 30 third-level categories for validation and testing, respectively including 428 and 323 fourth-level categories. For each of the 30 third-level categories, we we randomly split 100 PDB entries with 50 for validation and 50 for testing, while the remaining entries are utilized for training.
Hardware Specification Yes The model undergoes training for 1, 000, 000 steps using 8 NVIDIA RTX A6000 GPU cards
Software Dependencies No The paper mentions initializing parameters with '650M ESM-2 parameters (Lin et al., 2022b)' but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries used in the implementation.
Experiment Setup Yes The hyperparameters λ/2 and K are set to 1.0 and 30, respectively. The model undergoes training for 1, 000, 000 steps... The batch size and learning rate are set to 8192 tokens and 3e-4 respectively.