Prefix-Tree Decoding for Predicting Mass Spectra from Molecules
Authors: Samuel Goldman, John Bradshaw, Jiayi Xin, Connor Coley
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show promising empirical results on mass spectra prediction tasks. We evaluate SCARF on spectra prediction ( 4.2) and molecule identification in a retrieval task ( 4.3). |
| Researcher Affiliation | Academia | Samuel Goldman Computational and Systems Biology MIT Cambridge, MA 02139 samlg@mit.edu John Bradshaw Chemical Engineering MIT Cambridge, MA 02139 jbrad@mit.edu Jiayi Xin Statistics and Actuarial Science The University of Hong Kong Pokfulam, Hong Kong xinjiayi@connect.hku.hk Connor W. Coley Chemical Engineering Electrical Engineering and Computer Science MIT Cambridge, MA 02139 ccoley@mit.edu |
| Pseudocode | Yes | Algorithm A.1: Pseudo-code for SCARF-Thread, which generates prefix trees from a root node autoregressively, one level at a time. |
| Open Source Code | Yes | model code can be found at https://github.com/samgoldman97/ms-pred. |
| Open Datasets | Yes | We train and validate SCARF on two libraries: a gold standard commercial tandem mass spectrometry dataset, NIST20 [35], as well as a more heterogeneous public dataset, NPLIB1, extracted from the GNPS database [48] by Dührkop et al. [14] and subsequently processed by Goldman et al. [19]. |
| Dataset Splits | Yes | Both datasets are evaluated using a structure-disjoint 90%/10% train/test split with 10% of training data held out for validation, such that all compounds in the test set are not seen in the train and validation sets. |
| Hardware Specification | Yes | We train each of our models on a single RTX A5000 NVIDIA GPU (CUDA Version 11.6), making use of the Torch Lightning [15] library to manage the training. |
| Software Dependencies | Yes | We train each of our models on a single RTX A5000 NVIDIA GPU (CUDA Version 11.6), making use of the Torch Lightning [15] library to manage the training. Py Torch: An imperative style, high-performance deep learning library. [36] |
| Experiment Setup | Yes | Parameters are detailed in Table A10. (Table A10 provides specific values for learning rate, dropout, hidden size, layers, batch size, weight decay, etc.) |