PPFLOW: Target-Aware Peptide Design with Torsional Flow Matching
Authors: Haitao Lin, Odin Zhang, Huifeng Zhao, Dejun Jiang, Lirong Wu, Zicheng Liu, Yufei Huang, Stan Z. Li
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that PPFLOW reaches stateof-the-art performance in tasks of peptide drug generation and optimization in comparison with baseline models, and can be generalized to other tasks including docking and side-chain packing. |
| Researcher Affiliation | Academia | 1Zhejiang University; 2AI Lab, Research Center for Industries of the Future, Westlake University, Hangzhou, China. |
| Pseudocode | No | The paper describes the sampling process as a solution of ordinary differential equations in Section 3.6, with steps clearly outlined in text and equations, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Our method has been opened to the public in https: //github.com/Edapinenut/ppflow. |
| Open Datasets | Yes | To satisfy the need for massive data to train deep learning models, we construct PPBench2024, through a series of systematic steps: First, we source complexes from the RCSB database (Zardecki et al., 2016), specifically selecting those containing more than two chains and excluding any with nucleic acid structures, and defining interactions between a pair as a minimum intermolecular distance of 5.0 A or less. [...] Further, we screen the existing datasets of Propedia V2.3 (Martins et al., 2023) and Pep BDB (Wen et al., 2018) with the same criterion, leading to additional 6523 instances to expand it. |
| Dataset Splits | Yes | We split the PPBench2024 into training and validation sets according to the clustering of the proteins that are closest to the peptide ligand via MMSEQS2 (Steinegger & S oding, 2017) with the ratio of 9 : 1. |
| Hardware Specification | No | The paper mentions that 'For example, G obtained by ADCP requires more than 10 minutes for one pair of protein and peptides on a server with 128 CPU threads' and acknowledges 'the Westlake University HPC Center for providing computational resources,' but does not specify exact CPU models, GPU models, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions several software tools and libraries used (e.g., ADCP, FOLDX, MMSEQS2, ROSETTA, HDOCK, VINADOCK), but it does not specify their version numbers, which is necessary for reproducibility. |
| Experiment Setup | Yes | Here we give the hyper-parameters and other training details. The learning rate lr is 5e 5. In all training, the max training iteration is 200000. Lambda LR schedule is used, with lr lambda is set as 0.95 lr. The batch size is set 16 or 32, because it affects the performance little. In the neural networks, we set the MLP for extracting pair relations as 2 layers with hidden dimension as 64, and the MLP for single amino acid as 2 layers with hidden dimension as 128. Following, 6 layers of transformer are stacked behind, and the final layer is the LOCS which has been discussed before. |