Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Model to Search for Synthesizable Molecules
Authors: John Bradshaw, Brooks Paige, Matt J. Kusner, Marwin Segler, José Miguel Hernández-Lobato
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we evaluate MOLECULE CHEF in (1) its ability to generate a diverse set of valid molecules; (2) how useful its learnt latent space is when optimizing product molecules for some property; and (3) whether by training a regressor back from product molecules to the latent space, MOLECULE CHEF can be used as part of a setup to perform retrosynthesis. ... The results are shown in Table 1. |
| Researcher Affiliation | Collaboration | John Bradshaw University of Cambridge MPI for Intelligent Systems EMAIL Brooks Paige University of Cambridge The Alan Turing Institute EMAIL Matt J. Kusner University College London The Alan Turing Institute EMAIL Marwin H. S. Segler Benevolent AI Westfälische Wilhelms-Universität Münster EMAIL José Miguel Hernández-Lobato University of Cambridge The Alan Turing Institute Microsoft Research Cambridge EMAIL |
| Pseudocode | Yes | Algorithm 1 MOLECULE CHEF s Decoder |
| Open Source Code | Yes | Further details can also be found in our appendix and code is available at https://github.com/john-bradshaw/molecule-chef |
| Open Datasets | Yes | In order to train our model we need a dataset of reactant bags. For this we use the USPTO dataset [31], processed and cleaned up by Jin et al. [19]. |
| Dataset Splits | Yes | We filter our training (using Jin et al. [19] s split) dataset so that each reaction only contains reactants that occur at least 15 times across different reactions in the original larger training USPTO dataset. ... We evaluate on a filtered version of Jin et al. [19] s test set split of USPTO, where we have filtered out any reactions which have the exact same reactant and product multisets as a reaction present in the set used to train Molecule Chef. In addition, we further split this filtered set into two sets: (i) Reachable Products , which are reactions in the test set that contain as reactants only molecules that are in MOLECULE CHEF s reactant vocabulary, and (ii) Unreachable Products , which have at least one reactant molecule that is not in the vocabulary. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'RDKit' and 'Molecular Transformer', but it does not specify version numbers for these or any other software dependencies, which would be necessary for reproducible replication. |
| Experiment Setup | No | The paper mentions architectural details such as '4 layer Gated Graph Neural Networks (GGNN)' and 'a 2 hidden layer property predictor NN', and a weighting factor 'λ = 10'. However, it does not provide comprehensive experimental setup details like learning rates, batch sizes, specific optimizer settings, or number of epochs, which are crucial for full reproducibility. |