Protein Sequence and Structure Co-Design with Equivariant Translation

Authors: Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the Structural Antibody Database (SAb Dab) (Dunbar et al., 2014) as well as two protein design benchmark data sets curated from CATH (Orengo et al., 1997), and compare PROTSEED against previous state-of-the-art methods on multiple tasks, ranging from antigen-specific antibody CDR design to context-conditioned protein design and fixed backbone protein design. Numerical results show that our method significantly outperforms previous baselines and can generate high fidelity proteins in terms of both sequence and structure, while running orders of magnitude faster than sampling-based methods.
Researcher Affiliation Collaboration Chence Shi1,2,3, Chuanrui Wang2,3, Jiarui Lu2,3, Bozitao Zhong2,3, Jian Tang1,2,4,5 1Bio Geometry 2Mila Québec AI Institute 3Université de Montréal 4HEC Montréal 5CIFAR AI Research Chair chence.shi@{biogeom.com,umontreal.ca} {chuanrui.wang,jiarui.lu,bozitao.zhong}@umontreal.ca jian.tang@hec.ca
Pseudocode Yes The pseudo code of PROTSEED is provided in Algorithm 1. The proposed PROTSEED consists of a trigonometry-aware encoder (Algorithm 1, line 2-7) that reasons geometrical constraints and interactions from context features, and a roto-translation equivariant decoder (Algorithm 1, line 1019) that translates protein sequence and structure interdependently.
Open Source Code No All codes, datasets, and experimental environments will be released upon the acceptance of this work.
Open Datasets Yes We conduct extensive experiments on the Structural Antibody Database (SAb Dab) (Dunbar et al., 2014) as well as two protein design benchmark data sets curated from CATH (Orengo et al., 1997)
Dataset Splits Yes The clusters are then divided into training, validation, and test set with a ratio of 8:1:1.
Hardware Specification Yes To demonstrate the efficiency of our method, we test inference stage of different approaches using a single V100 GPU card on the same machine, and present average runtime of these methods on proteins of different sizes.
Software Dependencies No The paper states 'PROTSEED is implemented in Pytorch.' but does not provide specific version numbers for PyTorch or any other key software libraries or dependencies. It also mentions tools like MMseqs2 and DSSP, but without version information.
Experiment Setup Yes The hidden dimension is set as 128 for pair features and 256 for single features across all modules. For training, we use a learning rate of 0.001 with 2000 linear warmup iterations. The model is optimized with Adam optimizer on four Tesla V100 GPU cards with distributed data parallel.