Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension

Authors: Jiahan Li, Tong Chen, Shitong Luo, Chaoran Cheng, Jiaqi Guan, Ruihan Guo, Sheng Wang, Ge Liu, Jian Peng, Jianzhu Ma

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments, including peptide design and peptide scaffold generation, demonstrate the strong potential of Pep HAR in computational peptide binder design. Extensive experiments on both peptide generation and scaffold-based design, we demonstrated the effectiveness of Pep HAR in computational peptide design, highlighting its potential for advancing drug discovery and therapeutic development. We evaluate Pep HAR and several baseline methods on two main tasks: (1) Peptide Binder Design and (2) Peptide Scaffold Generation. In Peptide Design, we co-generate both the structure and sequence of peptides based on their binding pockets within the target protein. Table 1: Evaluation of methods in the peptide design task. Table 2: Evaluation of methods in the scaffold generation task. Table 3: Ablation results of Pep HAR in peptide design task.
Researcher Affiliation Collaboration Jiahan Li1 , Tong Chen2 , Shitong Luo3, Chaoran Cheng4, Jiaqi Guan5, Ruihan Guo6, Sheng Wang2, Ge Liu4, Jian Peng6, Jianzhu Ma1 1Tsinghua University, 2University of Washington, 3Massachusetts Institute of Technology, 4University of Illinois Urbana-Champaign, 5Byte Dance Inc., 6Helixon Inc.
Pseudocode Yes Algorithm 1: Peptide Sampling Outline Data: Target protein T, peptide length N, hot-spot residue count k, and indices [i1, ..., ik]
Open Source Code Yes The source code will be available at https://github.com/Ced3-han/Pep HAR.
Open Datasets Yes Following Li et al. (2024a), we construct our training and test datasets. This moderate-length benchmark is derived from Pep BDB (Wen et al., 2019) and Q-Bio Lip (Wei et al., 2024), with duplicates and low-quality entries removed.
Dataset Splits Yes The dataset consists of 158 complexes across 10 clusters from mmseqs2 (Steinegger & S oding, 2017), with an additional 8, 207 non-homologous examples used for training and validation.
Hardware Specification Yes Both the density and prediction models are trained on 8 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using 'Pyrosetta' and 'Alpha Fold2 Multimer' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We set the batch size to 64, using Adam as the optimizer with a learning rate of 3 10 4. To prevent overfitting, we also applied a dropout rate of 0.5 at each layer in the IPA. We employed an early stopping strategy, training the density model for 1400 iterations and the prediction model for 2400 iterations. The IPA encoder consists of 4 layers, each with 8 query heads and a hidden dimension of 32. For the sampling process, we use 10 iterations with an update rate of 0.01 in the founding stage to sample anchors by default, followed by 100 fine-tuning steps with an update rate of 0.1 in the correction stage.