Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion Trajectory Prediction

Authors: Zuobai Zhang, Minghao Xu, Aurelie C. Lozano, Vijil Chenthamarakshan, Payel Das, Jian Tang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that the performance of Diff Pre T is consistently competitive on all tasks, and Siam Diff achieves new state-of-the-art performance, considering the mean ranks on all tasks.
Researcher Affiliation Collaboration Zuobai Zhang1,2 , Minghao Xu1,2 , Aurélie Lozano3, Vijil Chenthamarakshan3, Payel Das3 , Jian Tang1,4,5 *equal contribution corresponding author 1Mila Québec AI Institute, 2Université de Montréal, 3IBM Research, 4HEC Montréal, 5CIFAR AI Chair {zuobai.zhang, minghao.xu}@mila.quebec, {ecvijil,aclozano,daspa}@us.ibm.com, jian.tang@hec.ca
Pseudocode Yes Algorithm 1 Siam Diff Pre-Training
Open Source Code No Code will be released upon acceptance.
Open Datasets Yes Following Zhang et al. [87], we pre-train our models with the Alpha Fold protein structure database v1 [44, 70], including 365K proteome-wide predicted structures.
Dataset Splits Yes The EC task involves 538 binary classification problems... We use dataset splits from Gligorijevi c et al. [24] with a 95% sequence identity cutoff. The ATOM3D tasks include Protein Interface Prediction (PIP), Mutation Stability Prediction (MSP), Residue Identity (RES), and Protein Structure Ranking (PSR) with different dataset splits based on sequence identity or competition year.
Hardware Specification Yes All methods are pre-trained on 4 Tesla A100 GPUs and Table 5 reports the batch sizes on each GPU. All residue-level tasks are run on 4 V100 GPUs while all atom-level tasks are run on A100 GPUs.
Software Dependencies No All these methods are developed based on Py Torch and Torch Drug [88]. (No version numbers provided for PyTorch or Torch Drug.)
Experiment Setup Yes In Diff Pre T, for structure diffusion, we use a sigmoid schedule for variances βt with the lowest variance β1 = 1e 4 and the highest variance βT = 0.1. For sequence diffusion, we simply set the cumulative transition probability to [MASK] over time steps as a linear interpolation between minimum mask rate 0.15 and maximum mask rate 1.0. The number of diffusion steps is set as 100. In Siam Diff, we adopt the same hyperparameters for multimodal diffusion models. We set the variance of torsional perturbation noises as 0.1π on the atom level and that of Gaussian perturbation noises as 0.3 on the residue level when constructing the correlated conformer. (Tables 5 and 6 also provide specific batch sizes, optimizers, and learning rates.)