Generalized Protein Pocket Generation with Prior-Informed Flow Matching

Authors: ZAIXI ZHANG, Marinka Zitnik, Qi Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that Pocket Flow outperforms baselines on multiple benchmarks, e.g., achieving an average improvement of 1.29 in Vina Score and 0.05 in sc RMSD. Moreover, modeling interactions make Pocket Flow a generalized generative model across multiple ligand modalities, including small molecules, peptides, and RNA.
Researcher Affiliation Academia 1: School of Computer Science and Technology, University of Science and Technology of China 2:State Key Laboratory of Cognitive Intelligence, Hefei, Anhui, China 3:Harvard University
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code Yes The code is provided at https://github.com/zaixizhang/Pocket Flow.
Open Datasets Yes Following previous works [29, 71, 92] we consider two widely used protein-small molecule binding datasets for experimental evaluations: Cross Docked dataset [27]... Binding MOAD dataset [34]... To test the generalizability of Pocket Flow to other ligand modalities, we further consider PPDBench [3], which contains 133 non-redundant complexes of protein-peptides and PDBBind RNA [80]...
Dataset Splits Yes Cross Docked dataset [27] is generated through crossdocking and is split with mmseqs2 [75] at 30% sequence identity, leading to train/val/test set of 100k/100/100 complexes. Binding MOAD dataset [34]... resulting in 40k protein-small molecule pairs for training, 100 pairs for validation, and 100 pairs for testing.
Hardware Specification Yes All the baselines are run on the same Tesla A100 GPU. ... We train on a Tesla A100 GPU for 20 epochs.
Software Dependencies No The paper mentions software like 'Open Babel' and 'Adam optimizer' but does not specify their version numbers.
Experiment Setup Yes In Pocket Flow, the number of network blocks is set to 8, the number of transformer layers within each block is set to 4, and the number of hidden channels used in the IPA calculation is set to 16. The node embedding size Dh and the edge embedding size Dz are set as 128. We removed skip connections and psi-angle prediction. For model training, we use Adam [45] optimizer with learning rate 0.0001, β1 = 0.9, β2 = 0.999. We train on a Tesla A100 GPU for 20 epochs. In the sampling process, the total number of steps T is set as 50. γ, ξ1, ξ2, and ξ3 are set as 1 in the default setting.