Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Universal Biological Sequence Reranking for Improved De Novo Peptide Sequencing
Authors: Zijie Qiu, Jiaqi Wei, Xiang Zhang, Sheng Xu, Kai Zou, Zhi Jin, Zhiqiang Gao, Nanqing Dong, Siqi Sun
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Rank Novo not only surpasses its base models used to generate training candidates for reranking pre-training, but also sets a new state-of-the-art benchmark. Moreover, Rank Novo exhibits strong zero-shot generalization to unseen models those whose generations were not exposed during training, highlighting its robustness and potential as a universal reranking framework for peptide sequencing. |
| Researcher Affiliation | Collaboration | 1Fudan University 2Shanghai Artificial Intelligence Laboratory 3Zhejiang University 4University of British Columbia 5Net Mind.AI 6Protago Labs Inc 7Soochow University. Correspondence to: Siqi Sun <EMAIL>, Nanqing Dong <EMAIL>. |
| Pseudocode | No | The paper describes the model architecture and mathematical formulations for PMD and RMD metrics, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is provided on Git Hub 1. 1https://github.com/BEAM-Labs/denovo |
| Open Datasets | Yes | Following the precedent set by recent studies (Yilmaz et al., 2023; Zhang et al., 2024), we employ three public peptide-spectrum match (PSMs) datasets: Mass IVEKB (Wang et al., 2018) for training, and 9-species-V1 (Tran et al., 2017) and 9-species-V2 (Yilmaz et al., 2023) for evaluation, enabling comparisons with state-of-the-art de novo peptide sequencing methods. |
| Dataset Splits | Yes | Each PTM included 62.5K spectra split 8:1:1 for training/validation/testing. |
| Hardware Specification | Yes | The training is conducted on 4 A100 40G GPUs. |
| Software Dependencies | No | The paper mentions implementation details and training parameters but does not specify software dependencies with version numbers (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | Rank Novo is implemented with the following hyperparameters: 8 layers for both the spectrum encoder and peptide feature mixer, 8 attention heads, a model dimension of 512, a feed-forward dimension of 1024, and a dropout rate of 0.30. ... Rank Novo is trained using an Adam W optimizer with a learning rate of 1e-4 and weight decay of 8e-5. The model is trained with a batch size of 256 for 5 epochs, including a 1-epoch warm-up period. A cosine learning rate scheduler is employed, and gradients are clipped to 1.5 using L2 norm. |