Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation

Authors: Chao Song, ZHIYUAN LIU, Han Huang, Liang Wang, Qiong Wang, Jian-Yu Shi, Hui Yu, Yihang Zhou, Yang Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We benckmark Enzy Control on Enzy Bind, evaluating the generated enzyme backbones across multiple structural and functional metrics. Experiments show that Enzy Control achieves 0.7160 in designability, a significant 13% relative improvement compared to the second-best model (see Table 1). It also demonstrates significantly improved catalytic efficiency (kcat) and functional alignment (EC match rate), achieving 13% and 10% improvements, respectively, over the suboptimal baselines. Enzy Control also achieves 3% improvement of binding affinity than the second-best model on Enzy Bench (Table 7). Additional quantitative analyses further highlight its strong residue efficiency (Fig. 9). In particular, Enzy Control consistently generates sequences that are approximately 30% shorter, while maintaining comparable kcat values across all catalytic efficiency ranges indicating its ability to produce compact, functionally robust designs suitable for practical applications.
Researcher Affiliation	Academia	Chao Song1 , Zhiyuan Liu2 , Han Huang3, Liang Wang4, Qiong Wang1, Jianyu Shi1, Hui Yu1 , Yihang Zhou2 , Yang Zhang2 1Northwestern Polytechnical University, 2National University of Singapore 3The Chinese University of Hong Kong, 4Institute of Automation at CAS
Pseudocode	Yes	During generation, we condition the generation process on known structural motifs which are provided are treated as fixed anchors and stored as x1 = {trans1, rot1}. A binary mask determines which parts of the structure are generated and which parts are clamped to the known motif. At each denoising step, we overwrite the motif region in the predicted structure with its true value from x1, ensuring that motif geometry remains unchanged throughout the sampling process. This design enables consistent integration of known substructures while flexibly generating surrounding regions. The pesudocode is shown in Alg. 1.
Open Source Code	Yes	The code is released at https://github.com/Vecteur-libre/Enzy Control.
Open Datasets	Yes	Resolving the absence of high-quality benchmarks, we construct Enzy Bind, a curated dataset of 11,100 enzyme-substrate pairs derived from PDBbind [48]. Each entry is enriched with functional site annotations via MSA. Further, we leverage enzyme family classification for evaluating the consistency of enzyme commission (EC) number between the generated sample and its target native enzyme, thereby providing a more rigorous evaluation framework. Enzy Bind is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0). This license allows users to copy, redistribute, remix, transform, and build upon the dataset for any purpose, including commercial use, provided appropriate credit is given to the creators. A copy of the license is available at https://creativecommons.org/licenses/by/4.0/.
Dataset Splits	Yes	Traditional data-splitting strategies for enzyme datasets often rely on chronological order training on complexes published before a certain date and testing on those afterward. However, since our objective is to generate enzyme backbones conditioned on desired functions, we adopt a functionally meaningful split based on sequence similarity. Specifically, we use CD-HIT [96] to cluster enzyme sequences and ensure that enzymes in the training and test sets are disjoint. Clusters are then randomly assigned to either training or testing, and enzyme-substrate pairs are sampled accordingly.
Hardware Specification	Yes	All experiments were conducted on a high-performance computing node equipped with 4 NVIDIA A100 GPUs (80GB) and dual Intel(R) Xeon(R) Gold 6348 CPUs (2.60GHz, 2 sockets, 28 cores per socket, 112 threads in total).
Software Dependencies	No	The paper mentions several software tools like RDKit library [73], Open Babel [98], Protein MPNN [20], ESMFold [87], CLEAN [92], Uni KP [93], and Gnina [94] but does not specify their version numbers in the main text or appendices for replication purposes.
Experiment Setup	Yes	We adopt Low-Rank Adaptation (Lo RA) with a rank of r = 16 and a scaling factor α = 32, targeting key linear projection modules across attention and embedding components, as specified in Table 11. The node and edge embeddings are configured with dimensionalities of 256 and 128, respectively. Our model supports a maximum of 2000 residues and embeds 1000 discrete timesteps using both sinusoidal and learned positional encodings. Node-level features include spatial coordinates, timestep embeddings, and optional chain-level signals. For edge features, we employ relative position encoding, discretized into 22 bins, and include diffusion-specific masks and self-conditioning mechanisms to enhance robustness. The IPA module comprises six stacked blocks with multi-head attention (8 heads), point-based QK and V projections, and a lightweight sequence-level Transformer consisting of 2 layers with 4 heads each. These configurations were selected based on empirical validation to balance computational efficiency with modeling capacity.