ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention
Authors: Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Huiqun Yu, Ziyi Zhou, Wanli Ouyang, Bingxin Zhou, Pan Tan, Liang Hong
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the proposed Pro SST, we conduct extensive experiments on the zero-shot mutation effect prediction and several supervised downstream tasks, where Pro SST achieves the state-of-the-art performance among all baselines. Our code and pre-trained models are publicly available 2. |
| Researcher Affiliation | Collaboration | 1 Shanghai Jiao Tong University, China {zy-zhou,bingxin.zhou,hongl3liang}@sjtu.edu.cn, tpan1039@gmail.com, 2 Shanghai Artificial Intelligence Laboratory, China {ouyang-wanli,maxinzhu}@pjlab.org.cn 3 East China University of Science and Technology, China {lmc,tyang}@mail.ecust.edu.cn, yhq@ecust.edu.cn 4 The Chinese University of Hong Kong, China zbztzhz@gmail.com; 5 Chongqing Artificial Intelligence Research Institute of Shanghai Jiao Tong University, China |
| Pseudocode | No | The paper describes methods in text and uses figures (e.g., Figure 1, Figure 2) to illustrate architecture and pipeline, but it does not contain formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and pre-trained models are publicly available 2. https://github.com/ai4protein/Pro SST |
| Open Datasets | Yes | The pre-training data is collected from Alpha Fold DB [13], which contains more than 214 million structures predicted by Alpha Fold [11]. We downloaded the 90% reduced version, containing 18.8 million structures.3. ... The dataset used for training the structure encoder originates from CATH43-S40 4. This dataset is manually annotated... We utilize the Protein GYM benchmark [43] to assess the zero-shot mutant effect prediction capabilities of Pro SST. ... The downstream datasets have the same train, valid, and test splits as Sa Prot s and are downloaded from Sa Prot. Data statistics are provided in Table A5. |
| Dataset Splits | Yes | From this collection, we randomly select 100,000 structures for validation (sequences with a similarity of over 30 to the training set will be removed for data deduplication.)... The downstream datasets have the same train, valid, and test splits as Sa Prot s and are downloaded from Sa Prot. Data statistics are provided in Table A5. (Table A5 shows Dataset Training Valid Test Total for different tasks). |
| Hardware Specification | Yes | All Pro SST models is trained on a DGX-A800 GPU (8 80G) server in BF16 precision for about a month. ... We computed the inference speed of Pro SST, Sa Prot (650M) and Sa Prot (35M) on proteins of different lengths using a batch size of 16 on a server equipped with two Intel 6248R processors and a 3090 GPU and the results are shown in Table 5(b). |
| Software Dependencies | No | The paper mentions using 'Adam W [57] as our optimizer' and 'Python' implicitly for its code, but does not specify version numbers for any programming languages, libraries, or other software dependencies crucial for replication. |
| Experiment Setup | Yes | The model has 12 transformer layers, 12 attention heads, and 768 embedding dims with 3172 feed-forward embedding dimensions with the GELU activation function. We train with 8192 tokens per mini-batch for 500,000 steps. We use Adam W [57] as our optimizer with β1 and β2 set to 0.9 and 0.999, and a weight decay value of 0.001. We warm up the learning rate from 0 to 0.0002 over the first 2000 steps, then decay it by a cosine schedule to the 0. We use a dropout rate of 0.1 and clip gradients using a clipping value of 1.0. ... For fine-tuning... We use for the Adam optimizer with β1 set to 0.9, β2 to 0.98, and applied an L2 weight decay of 0.001. The batch size was maintained at 64 ... and the learning rate was set at 0.00003, except for Go annotation prediction, where it was adjusted to 0.00001. |