reproducibilityindex.ai

Full-Atom Peptide Design with Geometric Latent Diffusion

Authors: Xiangzhe Kong, Yinjun Jia, Wenbing Huang, Yang Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first establish a benchmark consisting of both 1D sequences and 3D structures from Protein Data Bank (PDB) and literature for systematic evaluation. We then identify two major challenges of leveraging current diffusion-based models for peptide design: the full-atom geometry and the variable binding geometry. To tackle the first challenge, Pep GLAD derives a variational autoencoder that first encodes full-atom residues of variable size into fixed-dimensional latent representations, and then decodes back to the residue space after conducting the diffusion process in the latent space. For the second issue, Pep GLAD explores a receptor-specific affine transformation to convert the 3D coordinates into a shared standard space, enabling better generalization ability across different binding shapes. Experimental Results show that our method not only improves diversity and binding affinity significantly in the task of sequence-structure co-design, but also excels at recovering reference structures for binding conformation generation.
Researcher Affiliation	Collaboration	Xiangzhe Kong1,2 Yinjun Jia3 Wenbing Huang4 Yang Liu1,2,5 1Dept. of Comp. Sci. & Tech., Tsinghua University 2Institute for AIR, Tsinghua University 3School of Life Sciences, Tsinghua University 4Gaoling School of Artificial Intelligence, Renmin University of China 5Shanghai Artificial Intelligence Laboratory, Shanghai, China
Pseudocode	Yes	We provide the overall training procedure in Algorithm 1 (see Appendix D). ... The sampling procedure includes generative diffusion process on the standard latent states, recovering the original geometry with the inverse of F in Eq. 7, and decoding the sequence as well as the full-atom structure of the peptide (see Algorithm 2 in Appendix D).
Open Source Code	Yes	The codes for our Pep GLAD are open-sourced at https://github.com/THUNLP-MT/Pep GLAD.
Open Datasets	Yes	We first extract all dimers from the Protein Data Bank (PDB) [5] and select the complexes with a receptor longer than 30 residues and a ligand between 4 to 25 residues [59]... We also implement a split on Pep BDB [65] based on clustering results for evaluation... Further, we exploit 70k unsupervised data from protein fragments (Prot Frag) to facilitate training of the variational autoencoder. We show details and statistics of these datasets in Appendix E. The curated Pep Bench and Prot Frag are available at https://zenodo.org/records/13373108.
Dataset Splits	Yes	Finally, the remaining data are randomly split based on clustering results into training and validation sets, yielding a new bechmark calling Pep Bench. Further, we exploit 70k unsupervised data from protein fragments (Prot Frag) to facilitate training of the variational autoencoder. We also implement a split on Pep BDB [65] based on clustering results for evaluation. We show details and statistics of these datasets in Appendix E.
Hardware Specification	No	We train Pep GLAD on a GPU with 24G memory with Adam W optimizer. (Appendix I.1)
Software Dependencies	No	The paper mentions software like Free SASA, MMseqs2, Biopython, Rosetta, and pyrosetta but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	We train Pep GLAD on a GPU with 24G memory with Adam W optimizer. For the autoencoder, we train for 60 epochs with dynamic batches... The initial learning rate is 10 4 and decays by 0.8... Regarding the diffusion model, we train for 500 epochs... The learning rate is 10 4 and decay by 0.6... The hyperparameters of Pep GLAD used in our experiments are provided in Table 9.