MSA Generation with Seqs2Seqs Pretraining: Advancing Protein Structure Predictions
Authors: LE ZHANG, Jiayang Chen, Tao Shen, Yu Li, Siqi Sun
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on CASP14 and CASP15 benchmarks reveal significant improvements in LDDT scores, particularly for complex and challenging sequences, enhancing the performance of both Alpha Fold2 and Rose TTAFold. |
| Researcher Affiliation | Collaboration | Le Zhang1,3 , Jiayang Chen4 , Tao Shen5, Yu Li4 , Siqi Sun1,2 1 Fudan University 2 Shanghai Artificial Intelligence Laboratory 3 Mila, Universit e de Montr eal 4 The Chinese University of Hong Kong 5 Zelixir Biotech |
| Pseudocode | No | The paper describes the architecture and process but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is released at https://github.com/lezhang7/MSAGen. |
| Open Datasets | Yes | We employ CASP14/15 as our test set, a prestigious dataset that encompasses proteins from a broad spectrum of biological families. ... This process was iterated until no additional sequences emerged, searching parameters are detailed in appendix C. For every batch of sequences retrieved, a random selection was made, designating query with some as the source X and the remainder as the target Y , as illustrated in fig. 2. Notably, the assurance of co-evolutionary relationships is intrinsically facilitated by the search algorithm s mechanism. |
| Dataset Splits | No | The paper mentions CASP14/15 as a test set and a pretraining dataset, but does not explicitly detail train/validation/test splits for the pretraining dataset or validation splits for the evaluation datasets. |
| Hardware Specification | Yes | It s pretrained with ADAM-W at a 5e 5 rate, 0.01 linear warm-up, and square root decay for 200k steps on 8 A100 GPUs, batch size of 64, using a dataset containing 2M MSAs constructed as described in section 3.1. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as deep learning frameworks or libraries in the main text or appendices relevant to the research content. |
| Experiment Setup | Yes | Pretrained MSA-Generator adopts 12 transformer encoders/decoders with 260M parameters, 768 embedding size, and 12 heads. It s pretrained with ADAM-W at a 5e 5 rate, 0.01 linear warm-up, and square root decay for 200k steps on 8 A100 GPUs, batch size of 64, using a dataset containing 2M MSAs constructed as described in section 3.1. |