Prefix-Tree Decoding for Predicting Mass Spectra from Molecules

Authors: Samuel Goldman, John Bradshaw, Jiayi Xin, Connor Coley

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show promising empirical results on mass spectra prediction tasks. We evaluate SCARF on spectra prediction ( 4.2) and molecule identification in a retrieval task ( 4.3).
Researcher Affiliation Academia Samuel Goldman Computational and Systems Biology MIT Cambridge, MA 02139 samlg@mit.edu John Bradshaw Chemical Engineering MIT Cambridge, MA 02139 jbrad@mit.edu Jiayi Xin Statistics and Actuarial Science The University of Hong Kong Pokfulam, Hong Kong xinjiayi@connect.hku.hk Connor W. Coley Chemical Engineering Electrical Engineering and Computer Science MIT Cambridge, MA 02139 ccoley@mit.edu
Pseudocode Yes Algorithm A.1: Pseudo-code for SCARF-Thread, which generates prefix trees from a root node autoregressively, one level at a time.
Open Source Code Yes model code can be found at https://github.com/samgoldman97/ms-pred.
Open Datasets Yes We train and validate SCARF on two libraries: a gold standard commercial tandem mass spectrometry dataset, NIST20 [35], as well as a more heterogeneous public dataset, NPLIB1, extracted from the GNPS database [48] by Dührkop et al. [14] and subsequently processed by Goldman et al. [19].
Dataset Splits Yes Both datasets are evaluated using a structure-disjoint 90%/10% train/test split with 10% of training data held out for validation, such that all compounds in the test set are not seen in the train and validation sets.
Hardware Specification Yes We train each of our models on a single RTX A5000 NVIDIA GPU (CUDA Version 11.6), making use of the Torch Lightning [15] library to manage the training.
Software Dependencies Yes We train each of our models on a single RTX A5000 NVIDIA GPU (CUDA Version 11.6), making use of the Torch Lightning [15] library to manage the training. Py Torch: An imperative style, high-performance deep learning library. [36]
Experiment Setup Yes Parameters are detailed in Table A10. (Table A10 provides specific values for learning rate, dropout, hidden size, layers, batch size, weight decay, etc.)