reproducibilityindex.ai

Convolutional Networks on Graphs for Learning Molecular Fingerprints

Authors: David K. Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan Aspuru-Guzik, Ryan P. Adams

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments We ran two experiments to demonstrate that neural ﬁngerprints with large random weights behave similarly to circular ﬁngerprints. and Predictive accuracy We compared the performance of circular ﬁngerprints and neural graph ﬁngerprints under two conditions: In the ﬁrst condition, predictions were made by a linear layer using the ﬁngerprints as input. In the second condition, predictions were made by a one-hidden-layer neural network using the ﬁngerprints as input. In all settings, all differentiable parameters in the composed models were optimized simultaneously. Results are summarized in Table 4.2.
Researcher Affiliation	Academia	David Duvenaud , Dougal Maclaurin , Jorge Aguilera-Iparraguirre Rafael G omez-Bombarelli, Timothy Hirzel, Al an Aspuru-Guzik, Ryan P. Adams Harvard University
Pseudocode	Yes	Figure 2: Pseudocode of circular ﬁngerprints (left) and neural graph ﬁngerprints (right). Differences are highlighted in blue. Every non-differentiable operation is replaced with a differentiable analog.
Open Source Code	Yes	Code for computing neural ﬁngerprints and producing visualizations is available at github.com/HIPS/neural-fingerprint.
Open Datasets	Yes	Solubility: The aqueous solubility of 1144 molecules as measured by [4]., Drug efﬁcacy: The half-maximal effective concentration (EC50) in vitro of 10,000 molecules against a sulﬁde-resistant strain of P. falciparum, the parasite that causes malaria, as measured by [5]., Organic photovoltaic efﬁciency: The Harvard Clean Energy Project [8] uses expensive DFT simulations to estimate the photovoltaic efﬁciency of organic molecules. We used a subset of 20,000 molecules from this dataset.
Dataset Splits	No	relu had a slight but consistent performance advantage on the validation set. and Hyperparameter Optimization To optimize hyperparameters, we used random search. The hyperparameters of all methods were optimized using 50 trials for each cross-validation fold. The paper mentions a validation set and cross-validation but does not provide specific split ratios or methodology for creating the splits needed for reproduction.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments.
Software Dependencies	No	Our pipeline takes as input the SMILES [30] string encoding of each molecule, which is then converted into a graph using RDKit [20]. and Since we required relatively complex control ﬂow and indexing in order to implement variants of Algorithm 2, we used a more ﬂexible automatic differentiation package for Python called Autograd (github.com/HIPS/autograd). This package handles standard Numpy [18] code, and can differentiate code containing while loops, branches, and indexing. The paper lists RDKit, Autograd, and Numpy but does not provide specific version numbers for any of them.
Experiment Setup	Yes	Training used batch normalization [11]. relu had a slight but consistent performance advantage on the validation set. Each experiment optimized for 10000 minibatches of size 100 using the Adam algorithm [13], a variant of RMSprop that includes momentum. The hyperparameters of all methods were optimized using 50 trials for each cross-validation fold. The following hyperparameters were optimized: log learning rate, log of the initial weight scale, the log L2 penalty, ﬁngerprint length, ﬁngerprint depth (up to 6), and the size of the hidden layer in the fully-connected network. Additionally, the size of the hidden feature vector in the convolutional neural ﬁngerprint networks was optimized.