reproducibilityindex.ai

Learning to Extend Molecular Scaffolds with Structural Motifs

Authors: Krzysztof Maziarz, Henry Richard Jackson-Flux, Pashmina Cameron, Finton Sirockin, Nadine Schneider, Nikolaus Stiefl, Marwin Segler, Marc Brockschmidt

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Mo Le R performs comparably to state-of-the-art methods on unconstrained molecular optimization tasks, and outperforms them on scaffold-based tasks, while being an order of magnitude faster to train and sample from than existing approaches.
Researcher Affiliation	Industry	Krzysztof Maziarz Microsoft Research United Kingdom Henry Jackson-Flux Microsoft Research United Kingdom Pashmina Cameron Microsoft Research United Kingdom Finton Sirockin Novartis Switzerland Nadine Schneider Novartis Switzerland Nikolaus Stieﬂ Novartis Switzerland Marwin Segler Microsoft Research United Kingdom Marc Brockschmidt Microsoft Research United Kingdom
Pseudocode	Yes	Algorithm 1 Mo Le R s Generative Procedure and Algorithm 2 Determining a generation order
Open Source Code	Yes	Code is available at https://github.com/microsoft/molecule-generation.
Open Datasets	Yes	We use training data from Guaca Mol (Brown et al., 2019), which released a curated set of 1.5M drug-like molecules, divided into train, validation and test sets.
Dataset Splits	Yes	We use training data from Guaca Mol (Brown et al., 2019), which released a curated set of 1.5M drug-like molecules, divided into train, validation and test sets.
Hardware Specification	Yes	For all measurements in Table 1, we used a machine with a single Tesla K80 GPU.
Software Dependencies	Yes	Our own implementations (Mo Le R, CGVAE) are based on TensorFlow 2 (Abadi et al., 2016), while the models of Jin et al. (2018; 2020) (JT-VAE, Hier VAE) use PyTorch (Paszke et al., 2019).
Experiment Setup	Yes	We train our model using the Adam optimizer (Kingma & Ba, 2014). We found that adding an initial warm-up phase for the KL loss coefﬁcient λprior (i.e. increasing it from 0 to a target value over the course of training) helps to stabilize the model. ... We cap the total number of nodes rather than the total number of molecules, as that is more robust to varying sizes of molecules in the training data. ... For vocabulary sizes up to 32 we used λprior = 0.01, and then followed the logarithmic trend described here.