Full-Atom Peptide Design with Geometric Latent Diffusion
Authors: Xiangzhe Kong, Yinjun Jia, Wenbing Huang, Yang Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first establish a benchmark consisting of both 1D sequences and 3D structures from Protein Data Bank (PDB) and literature for systematic evaluation. We then identify two major challenges of leveraging current diffusion-based models for peptide design: the full-atom geometry and the variable binding geometry. To tackle the first challenge, Pep GLAD derives a variational autoencoder that first encodes full-atom residues of variable size into fixed-dimensional latent representations, and then decodes back to the residue space after conducting the diffusion process in the latent space. For the second issue, Pep GLAD explores a receptor-specific affine transformation to convert the 3D coordinates into a shared standard space, enabling better generalization ability across different binding shapes. Experimental Results show that our method not only improves diversity and binding affinity significantly in the task of sequence-structure co-design, but also excels at recovering reference structures for binding conformation generation. |
| Researcher Affiliation | Collaboration | Xiangzhe Kong1,2 Yinjun Jia3 Wenbing Huang4 Yang Liu1,2,5 1Dept. of Comp. Sci. & Tech., Tsinghua University 2Institute for AIR, Tsinghua University 3School of Life Sciences, Tsinghua University 4Gaoling School of Artificial Intelligence, Renmin University of China 5Shanghai Artificial Intelligence Laboratory, Shanghai, China |
| Pseudocode | Yes | We provide the overall training procedure in Algorithm 1 (see Appendix D). ... The sampling procedure includes generative diffusion process on the standard latent states, recovering the original geometry with the inverse of F in Eq. 7, and decoding the sequence as well as the full-atom structure of the peptide (see Algorithm 2 in Appendix D). |
| Open Source Code | Yes | The codes for our Pep GLAD are open-sourced at https://github.com/THUNLP-MT/Pep GLAD. |
| Open Datasets | Yes | We first extract all dimers from the Protein Data Bank (PDB) [5] and select the complexes with a receptor longer than 30 residues and a ligand between 4 to 25 residues [59]... We also implement a split on Pep BDB [65] based on clustering results for evaluation... Further, we exploit 70k unsupervised data from protein fragments (Prot Frag) to facilitate training of the variational autoencoder. We show details and statistics of these datasets in Appendix E. The curated Pep Bench and Prot Frag are available at https://zenodo.org/records/13373108. |
| Dataset Splits | Yes | Finally, the remaining data are randomly split based on clustering results into training and validation sets, yielding a new bechmark calling Pep Bench. Further, we exploit 70k unsupervised data from protein fragments (Prot Frag) to facilitate training of the variational autoencoder. We also implement a split on Pep BDB [65] based on clustering results for evaluation. We show details and statistics of these datasets in Appendix E. |
| Hardware Specification | No | We train Pep GLAD on a GPU with 24G memory with Adam W optimizer. (Appendix I.1) |
| Software Dependencies | No | The paper mentions software like Free SASA, MMseqs2, Biopython, Rosetta, and pyrosetta but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We train Pep GLAD on a GPU with 24G memory with Adam W optimizer. For the autoencoder, we train for 60 epochs with dynamic batches... The initial learning rate is 10 4 and decays by 0.8... Regarding the diffusion model, we train for 500 epochs... The learning rate is 10 4 and decay by 0.6... The hyperparameters of Pep GLAD used in our experiments are provided in Table 9. |