Spanning Tree-based Graph Generation for Molecules

Authors: Sungsoo Ahn, Binghong Chen, Tianzhe Wang, Le Song

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on QM9, ZINC250k, and MOSES benchmarks verify the effectiveness of the proposed framework in metrics such as validity, Fr echet Chem Net distance, and fragment similarity. We also demonstrate the usefulness of STGG in maximizing penalized Log P value of molecules.
Researcher Affiliation Collaboration 1POSTECH, 2Georgia Institute of Technology, 3Biomap, 4MBZUAI sungsoo.ahn@postech.ac.kr, {binghong, tianzhe}@gatech.edu, dasongle@gmail.com
Pseudocode Yes We provide the full algorithm in Algorithm 1.
Open Source Code Yes We submit the full implementation of our STGG framework and the baselines used in our experiments as a supplementary material.
Open Datasets Yes We experiment on popular graph generation benchmarks of QM9, ZINC250K, and MOSES to validate the effectiveness of our algorithm.
Dataset Splits No We train our generative model on the respective datasets and sample 10,000 molecules to measure (a) the ratio of valid molecules (VALID), (b) the ratio of unique molecules (UNIQUE), and (c) the ratio of novel molecules with respect to the training dataset (NOVEL). This describes metrics evaluated on generated samples, not the dataset splits for training/validation. The paper also mentions "The similarity metrics of FCD, SNN, FRAG, SCAF are measured with respect to the test dataset of molecules". While this refers to a test set, it doesn't specify the full train/validation/test splits, or how the validation set was used for model tuning.
Hardware Specification Yes Using a single Quadro RTX 6000 GPU, it takes approximately three, ten, and 96 hours to fully train the models on QM9, ZINC250K, and MOSES datasets, respectively.
Software Dependencies No The paper mentions "Adam W (Loshchilov & Hutter, 2019) optimizer" and refers to "Transformer-related configurations" from "Vaswani et al., 2017", but it does not specify software versions for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For all the experiments, we train the Transformer under STGG framework for 100 epochs with batch size of 128 for all the dataset. We use the Adam W (Loshchilov & Hutter, 2019) optimizer with constant learning rate of 10 4. We use three and six Transformer layers for {QM9, ZINC250K} and MOSES, respectively. The rest of Transformer-related configurations follow that of the original work (Vaswani et al., 2017); we use the attention module with embedding size of 1024 with eight heads, MLP with dimension of 2048, and dropout with probability of 0.1.