A Model to Search for Synthesizable Molecules

Authors: John Bradshaw, Brooks Paige, Matt J. Kusner, Marwin Segler, José Miguel Hernández-Lobato

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we evaluate MOLECULE CHEF in (1) its ability to generate a diverse set of valid molecules; (2) how useful its learnt latent space is when optimizing product molecules for some property; and (3) whether by training a regressor back from product molecules to the latent space, MOLECULE CHEF can be used as part of a setup to perform retrosynthesis. ... The results are shown in Table 1.
Researcher Affiliation Collaboration John Bradshaw University of Cambridge MPI for Intelligent Systems jab255@cam.ac.uk Brooks Paige University of Cambridge The Alan Turing Institute bpaige@turing.ac.uk Matt J. Kusner University College London The Alan Turing Institute m.kusner@ucl.ac.uk Marwin H. S. Segler Benevolent AI Westfälische Wilhelms-Universität Münster marwin.segler@benevolent.ai José Miguel Hernández-Lobato University of Cambridge The Alan Turing Institute Microsoft Research Cambridge jmh233@cam.ac.uk
Pseudocode Yes Algorithm 1 MOLECULE CHEF s Decoder
Open Source Code Yes Further details can also be found in our appendix and code is available at https://github.com/john-bradshaw/molecule-chef
Open Datasets Yes In order to train our model we need a dataset of reactant bags. For this we use the USPTO dataset [31], processed and cleaned up by Jin et al. [19].
Dataset Splits Yes We filter our training (using Jin et al. [19] s split) dataset so that each reaction only contains reactants that occur at least 15 times across different reactions in the original larger training USPTO dataset. ... We evaluate on a filtered version of Jin et al. [19] s test set split of USPTO, where we have filtered out any reactions which have the exact same reactant and product multisets as a reaction present in the set used to train Molecule Chef. In addition, we further split this filtered set into two sets: (i) Reachable Products , which are reactions in the test set that contain as reactants only molecules that are in MOLECULE CHEF s reactant vocabulary, and (ii) Unreachable Products , which have at least one reactant molecule that is not in the vocabulary.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like 'RDKit' and 'Molecular Transformer', but it does not specify version numbers for these or any other software dependencies, which would be necessary for reproducible replication.
Experiment Setup No The paper mentions architectural details such as '4 layer Gated Graph Neural Networks (GGNN)' and 'a 2 hidden layer property predictor NN', and a weighting factor 'λ = 10'. However, it does not provide comprehensive experimental setup details like learning rates, batch sizes, specific optimizer settings, or number of epochs, which are crucial for full reproducibility.