reproducibilityindex.ai

MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training

Authors: Bo Chen, Zhilei Bei, Xingyi Cheng, Pan Li, Jie Tang, Le Song

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments confirm the efficacy of MSAGPT in generating faithful virtual MSA to enhance the structure prediction accuracy (up to +8.5% TM-Score on few-shot scenarios).
Researcher Affiliation	Collaboration	1Tsinghua University 2Bio Map Research 3MBZUAI
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The model is available at https://github.com/THUDM/MSAGPT.
Open Datasets	Yes	We utilize the Uniclust30 MSA dataset from Open Protein Set [44], which is processed through an all-against-all search on Uniclust30 [45] using HHblits [46]. This results in approximately 16 million MSAs (See Appendix A.1 for Details).
Dataset Splits	Yes	For each task, we sample 1000 protein sequences with the corresponding labels. Then we use MSAGPT-DPO to generate 32 virtual MSAs with the generation strategy T=0.8 and P=0.8. Both setups are trained briefly (for one epoch) for 5-fold cross-validation as shown in Table 9, and we report the average performance.
Hardware Specification	Yes	All models are trained on 24 A800 GPUs for 254k updates, consuming about 150 billion tokens.
Software Dependencies	No	The paper mentions software components like 'Flash Attention-v1 [42]' and 'Adam W [50]', and implies Python/PyTorch/CUDA usage through GPU training, but does not provide specific version numbers for the core software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Regarding the backbone of MSAGPT, we employ the standard transformer decoder framework [47, 49] and train the model with 2.8 billion parameters owning 36 layers, 2560 embedding size, and 40 attention heads. We employ batches of 48 MSAs with each MSA containing 12,288 residues. We follow BF16 mixed-precision pre-training strategy. We use Adam W [50] as our optimizer with β1 = 0.9, β2 = 0.95, eps = 10 8 and a learning rate of 1.2 10 4. We use a cosine learning rate schedule, with a warmup of the first 2.5% steps, and decay the final learning rate down to 10% of the peak learning rate. We use a weight decay of 0.1 and gradient clipping of 1.0 without dropout.