AdaNovo: Towards Robust \emph{De Novo} Peptide Sequencing in Proteomics against Data Biases

Authors: Jun Xia, Shaorong Chen, Jingbo Zhou, Shan Xiaojun, Wenjie Du, Zhangyang Gao, Cheng Tan, Bozhen Hu, Jiangbin Zheng, Stan Z. Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments indicate that Ada Novo outperforms previous competitors on the widely-used 9-species benchmark, meanwhile yielding 3.6% 9.4% improvements in PTMs identification.
Researcher Affiliation Academia Jun Xia1 , Shaorong Chen1 , Jingbo Zhou1 , Xiaojun Shan2, Wenjie Du3, Zhangyang Gao 1, Cheng Tan1, Bozhen Hu1, Jiangbin Zheng1, Stan Z. Li1 1School of Engineering, Westlake University 2University of California San Diego 3University of Science and Technology of China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code for reproducing the results is available at: https://github.com/Westlake-Omics AI/adanovo_v1.
Open Datasets Yes We employ the nine-species benchmark initially introduced by Deep Novo [27]. This dataset amalgamates approximately 1.5 million mass spectra from nine distinct species, all employing the same instrument but analyzing peptides from different species.
Dataset Splits Yes Following previous works [27, 19, 37], we adopt a leave-one-out cross-validation framework. This entails training a model on eight species and testing it on the species held out for each of the nine species. We also split the eight species into training set and validation set with the ratio 9:1.
Hardware Specification Yes During the training process, we used one Nvidia A100 GPU with the batchsize as 32. and under the same hardware settings (1 Nvidia A100-SXM4-80GB GPU)
Software Dependencies No The paper mentions "Py Torch" and "Adam W optimizer" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The MS Encoder, Peptide Decoder # 1 and Peptide Decoder # 2 in Ada Novo are 9-layer Transformers, all of which come with 512 feed forward dimensions. During the training process, we used one Nvidia A100 GPU with the batchsize as 32. We set the learning rate at 0.0004 and applied a linear warm-up. For gradient updates, we used the Adam W optimizer [9]. The hyperparameters s1 and s2 are tuned within the range {0.05, 0.1, 0.3} using the validation set.