De novo mass spectrometry peptide sequencing with a transformer model
Authors: Melih Yilmaz, William Fondrie, Wout Bittremieux, Sewoong Oh, William S Noble
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Casanovo achieves state-of-the-art performance on a benchmark dataset using a standard cross-species evaluation framework which involves testing with spectra with never-before-seen peptide labels. |
| Researcher Affiliation | Collaboration | 1Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA 2Talus Bioscience, Seattle, WA, USA 3Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA 4Department of Genome Sciences, University of Washington, Seattle, WA, USA. |
| Pseudocode | No | The paper describes the Casanovo model architecture and training process in text and with a diagram, but it does not include formal pseudocode or an algorithm block. |
| Open Source Code | Yes | Casanovo s source code and trained model weights are available as open-source under the Apache 2.0 license at https://github.com/Noble-Lab/casanovo. |
| Open Datasets | Yes | To evaluate the performance of Casanovo and compare it with state-of-the-art de novo peptide sequencing methods, we use the nine-species benchmark data set and evaluation framework first introduced by (Tran et al., 2017) and used in several subsequent studies (Karunratanakul et al., 2019; Qiao et al., 2021). |
| Dataset Splits | Yes | Following (Tran et al., 2017), we employ a leave-one-out cross validation framework where we train a model on eight species and test on the held-out species for each of the nine species in the data set. In each case, we split the training set 90/10 for training and validation. |
| Hardware Specification | Yes | Models are trained on 2 RTX 2080 GPUs for 30 epochs... Casanovo also runs inference at a faster rate of 119 spectra/s on an RTX 2080 compared to Deep Novo s 36 spectra/s and Point Novo s reported 20 spectra/s on an RTX 2080 Ti (a comparatively faster GPU). |
| Software Dependencies | No | The paper mentions 'spectrum utils (Bittremieux, 2020)' for visualization, but it does not provide specific version numbers for the key software libraries or frameworks used to implement and run Casanovo (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | We train models with nine layers, embedding size d = 512, and eight attention heads, yielding a total of 47M model parameters. A batch size of 32 spectra and 10 5 weight decay is used during training, with a peak learning rate of 5 10 4. The learning rate is linearly increased from zero to its peak value in 100k warm-up steps, followed by a cosine shaped decay. Models are trained on 2 RTX 2080 GPUs for 30 epochs, which takes approximately two days, and model weights from the epoch with the lowest validation loss were selected for testing. These model hyperparameters number of layers, embedding size, number of attention heads, and learning rate schedule are used for all downstream experiments unless otherwise specified. |