reproducibilityindex.ai

Representing Molecules as Random Walks Over Interpretable Grammars

Authors: Michael Sun, Minghao Guo, Weize Yuan, Veronika Thost, Crystal Elaine Owens, Aristotle Franklin Grosz, Sharvaa Selvan, Katelyn Zhou, Hassan Mohiuddin, Benjamin J Pedretti, Zachary P Smith, Jie Chen, Wojciech Matusik

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate clear advantages over existing methods in terms of performance, efficiency, and synthesizability of predicted molecules, and we provide detailed insights into the method s chemical interpretability. Code is available at https://github.com/ shiningsunnyday/polymer_walk. ... Our experiments quantitatively answer the following questions: 1) How well does our method perform on property prediction for our setting of interest? 2) How well does our representation work for the generation of novel molecules, compared with both SOTA symbolic and deep molecular generative models?
Researcher Affiliation	Collaboration	1MIT CSAIL 2MIT Chemistry 3MIT-IBM Watson AI Lab, IBM Research 4MIT Chemical Engineering 5MIT 6Wellesley.
Pseudocode	Yes	Algorithm 1: function extract walk(D,B) ... Algorithm 2: function traverse dag(Gi, G) ... Algorithm 3: function build motif graph(V) ... Algorithm 4: function re order(childs) ... Algorithm 5: function dfs walk(cur, traj) ... Algorithm 6: function algo-diffusion ... Algorithm 7: function generate
Open Source Code	Yes	Code is available at https://github.com/ shiningsunnyday/polymer_walk.
Open Datasets	Yes	Group Contribution (GC) (Wang et al., 2018; Park & Paul, 1997; Wu et al., 2021). ... The Harvard organic photovoltaic dataset (HOPV) (Lopez et al., 2016). ... Predictive Toxicology Challenge (PTC) (Helma et al., 2001).
Dataset Splits	No	No explicit statement of a separate validation split. The paper states: 'For each (dataset, property) pair, we perform an 80-20 train-test split over 3 random seeds and report the mean and standard deviation.'
Hardware Specification	No	No specific hardware details like exact GPU/CPU models, processor types, or memory amounts are provided. The only mention related to hardware is 'For example, for the datasets we study, it is done under a minute when parallelized across 100 CPU cores', which does not specify the type of CPUs.
Software Dependencies	No	The paper mentions 'RDKit package (Landrum, 2016)', 'XGBoost', and 'GIN (Xu et al., 2019)' but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Table 8. Hyperparameter settings for property prediction. Hyperparameter Value: Number of layers 5, Activation Re LU, Hidden dimension 16, Motif featurization Morgan fingerprint, Motif feature dimension 2048, Input feature dimension 5 2048 + 2048 + \|G\|, Batch Size 1, Learning Rate 1e-3.