reproducibilityindex.ai

Projecting Molecules into Synthesizable Chemical Spaces

Authors: Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, Jianzhu Ma

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we first examine the model s capability of finding synthesis pathways for molecules that are known to be synthesizable, and explore the generalization ability of the models (Section 4.1). Then, we study the application of the model in three drug design settings: structure-based drug design (Section 4.2), goal-directed generation (Section 4.3), and hit expansion (Section 4.4). Our model produces valid synthetic pathways for most test cases and thus achieves a high success rate (Table 1).
Researcher Affiliation	Collaboration	1Helixon Research 2Massachusetts Institute of Technology 3Tsinghua University.
Pseudocode	Yes	Algorithm 1 Construct a (postfix notation, molecular graph) pair for training
Open Source Code	Yes	Code and data The code and data of this project are available at https://github.com/luost26/ChemProjector.
Open Datasets	Yes	We use the building blocks in the Enamine US Stock catalog retrieved in October 2023 (Enamine, 2023). For building blocks that include more than one molecule (e.g., salts, hydrates), we keep the largest molecule and drop the remaining ones... In addition, we include another challenging benchmark: 1,000 molecules from the ChEMBL database (Release 33)(Gaulton et al., 2012), reported to be predominantly unreachable in previous work (Gao et al., 2021).
Dataset Splits	Yes	First, we clustered building blocks into 128 groups using K-means algorithm based on their fingerprints, reserving one cluster exclusively for testing while using the remaining 127 for training.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'RDKit' and 'SMARTS string format (Daylight Chemical Information Systems, Inc., 2023)' but does not provide specific version numbers for these or other software dependencies necessary for replication.
Experiment Setup	Yes	The molecular graph encoder consists of 8 graph transformer layers, each of which has 8 attention heads, and the dimension of the input and output features is 512. The postfix notation decoder is a stack of 8 transformer decoder layers. Each has 8 attention heads, and the dimension of the input and output features is also 512. In the nearest-neighbor search, each building block molecule is indexed by the Morgan fingerprint (Morgan, 1965) of length 256 and radius 2. We use the Adam W optimizer (Loshchilov & Hutter, 2017) to train the network with a learning rate of 3e-4 and a batch size of 256 for 500,000 iterations. During the construction of synthesis for training, we limit the maximum number of reactions to 5 and the maximum number of atoms to 80. At inference time, we set the maximum sequence length to 16.