Learning to Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

Authors: Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Shengchao Liu, Simon Blackburn, Karam Thomas, Connor Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental PGFS achieves state-of-the-art performance in generating structures with high QED and clog P. Moreover, we validate PGFS in an in-silico proof-of-concept associated with three HIV targets.
Researcher Affiliation Collaboration Sai Krishna Gottipati * 1 Boris Sattarov * 1 Sufeng Niu 2 Yashaswi Pathak 1 3 Haoran Wei 1 4 Shengchao Liu 5 6 Karam J. Thomas 1 Simon Blackburn 6 Connor W. Coley 7 Jian Tang 8 6 9 Sarath Chandar 10 8 6 Yoshua Bengio 5 11 8 6
Pseudocode Yes Algorithm 1 PGFS
Open Source Code Yes The HIV targets activity datasets used, predictive QSAR models and prediction scripts can be found at this url: https: //github.com/99and Beyond/Apollo1060. The full list of SMILES of the building blocks can be found in the github repository of this work.
Open Datasets Yes The HIV targets activity datasets used, predictive QSAR models and prediction scripts can be found at this url: https: //github.com/99and Beyond/Apollo1060. The full datasets used for QSAR modeling are provided in the github repository.
Dataset Splits Yes The validation set constitutes randomly chosen 2,000 R(1)s initial reactants from the set of 150,560 available reactants.
Hardware Specification No The paper describes the model architecture and training process but does not provide specific details about the hardware used for experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies Yes RDKit s Run Reactants function (version 2019.03.1)
Experiment Setup Yes The f network uses four fully connected layers with 256, 128, 128 neurons in the hidden layers. The network uses four fully connected layers with 256, 256, 167 neurons in the hidden layers. All the hidden layers use Re LU activation whereas the final layer uses tanh activation. Similarly, the Q network also uses four fully connected layers with 256, 64, 16 neurons in the hidden layers, with Re LU activation for all the hidden layers and linear activation for the final layer. We use the Adam optimizer to train all the networks with a learning rate of 1e-4 for the f and networks and 3e-4 for the Q network. Further, we used a discount factor γ = 0.99, mini batch size = 32, and soft update weight for target networks, = 0.005.