reproducibilityindex.ai

Chemical-Reaction-Aware Molecule Representation Learning

Authors: Hongwei Wang, Weijiang Li, Xiaomeng Jin, Kyunghyun Cho, Heng Ji, Jiawei Han, Martin D. Burke

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3 EXPERIMENTS Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks, e.g., reaction product prediction, molecule property prediction, reaction classiﬁcation, and graph-edit-distance prediction.
Researcher Affiliation	Collaboration	Hongwei Wang1, Weijiang Li1, Xiaomeng Jin1, Kyunghyun Cho2,3, Heng Ji1, Jiawei Han1, Martin D. Burke1 1University of Illinois Urbana-Champaign, 2New York University, 3Genentech {hongweiw, wl13, xjin17, hengji, hanj, mdburke}@illinois.edu, kyunghyun.cho@nyu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks clearly labeled or formatted as such.
Open Source Code	Yes	The code is available at https://github.com/hwwang55/Mol R.
Open Datasets	Yes	We use reactions from USPTO granted patents collected by Lowe (2012) as the dataset, which is further cleaned by Zheng et al. (2019a). We evaluate Mol R on ﬁve datasets: BBBP, HIV, BACE, Tox21, and Clin Tox, proposed by Wu et al. (2018). We randomly sample 10,000 molecule pairs from the ﬁrst 1,000 molecules in QM9 dataset (Wu et al., 2018).
Dataset Splits	Yes	The dataset contains 478,612 chemical reactions, and is split into training, validation, and test set of 408,673, 29,973, and 39,966 reactions, respectively, so we refer to this dataset as USPTO-479k. All datasets are split into training, validation, and test set by 8:1:1.
Hardware Specification	Yes	The average time cost per epoch and the maximal memory cost of Mol R-GCN when varying the batch size. (run on an NVIDIA V100 GPU).
Software Dependencies	No	The implementation of GNNs is based on Deep Graph Library (DGL). We use pysmiles to parse the SMILES strings of molecules to Network X graphs. We use Adam optimizer and a Logistic Regression model implemented in scikit-learn. However, specific version numbers for these software components are not provided.
Experiment Setup	Yes	The number of layers for all GNNs is 2, the output dimension of all layers is 1,024, and the READOUT function is sum. The margin γ is set to 4. We train the model for 20 epochs with a batch size of 4,096, using Adam (Kingma & Ba, 2015) optimizer with a learning rate of 10^-4.