reproducibilityindex.ai

Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

Authors: Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our Enzy Gen consistently achieves the best performance across all 323 testing families, surpassing the best baseline by 10.79% in terms of substrate binding affinity.
Researcher Affiliation	Academia	1Language Technologies Institute, Carnegie Mellon University. 2Department of Chemistry, Massachusetts Institute of Technology 3Department of EECS, Massachusetts Institute of Technology. 4Broad Institute of MIT and Harvard. 5Department of Chemistry and Biochemistry, University of California Santa Barbara.
Pseudocode	No	The paper includes architectural diagrams (Figure 1) but does not provide any pseudocode or algorithm blocks.
Open Source Code	Yes	The code, model and dataset are released at https: //github.com/Lei Li Lab/Enzy Gen.
Open Datasets	Yes	We further construct Enzy Bench, a dataset with 3157 enzyme families, covering all available enzymes within the protein data bank (PDB). The code, model and dataset are released at https: //github.com/Lei Li Lab/Enzy Gen.
Dataset Splits	Yes	Then we select 30 third-level categories for validation and testing, respectively including 428 and 323 fourth-level categories. For each of the 30 third-level categories, we we randomly split 100 PDB entries with 50 for validation and 50 for testing, while the remaining entries are utilized for training.
Hardware Specification	Yes	The model undergoes training for 1, 000, 000 steps using 8 NVIDIA RTX A6000 GPU cards
Software Dependencies	No	The paper mentions initializing parameters with '650M ESM-2 parameters (Lin et al., 2022b)' but does not provide specific version numbers for software dependencies like Python, PyTorch, or other libraries used in the implementation.
Experiment Setup	Yes	The hyperparameters λ/2 and K are set to 1.0 and 30, respectively. The model undergoes training for 1, 000, 000 steps... The batch size and learning rate are set to 8192 tokens and 3e-4 respectively.