Learning to Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning
Authors: Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Shengchao Liu, Simon Blackburn, Karam Thomas, Connor Coley, Jian Tang, Sarath Chandar, Yoshua Bengio
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | PGFS achieves state-of-the-art performance in generating structures with high QED and clog P. Moreover, we validate PGFS in an in-silico proof-of-concept associated with three HIV targets. |
| Researcher Affiliation | Collaboration | Sai Krishna Gottipati * 1 Boris Sattarov * 1 Sufeng Niu 2 Yashaswi Pathak 1 3 Haoran Wei 1 4 Shengchao Liu 5 6 Karam J. Thomas 1 Simon Blackburn 6 Connor W. Coley 7 Jian Tang 8 6 9 Sarath Chandar 10 8 6 Yoshua Bengio 5 11 8 6 |
| Pseudocode | Yes | Algorithm 1 PGFS |
| Open Source Code | Yes | The HIV targets activity datasets used, predictive QSAR models and prediction scripts can be found at this url: https: //github.com/99and Beyond/Apollo1060. The full list of SMILES of the building blocks can be found in the github repository of this work. |
| Open Datasets | Yes | The HIV targets activity datasets used, predictive QSAR models and prediction scripts can be found at this url: https: //github.com/99and Beyond/Apollo1060. The full datasets used for QSAR modeling are provided in the github repository. |
| Dataset Splits | Yes | The validation set constitutes randomly chosen 2,000 R(1)s initial reactants from the set of 150,560 available reactants. |
| Hardware Specification | No | The paper describes the model architecture and training process but does not provide specific details about the hardware used for experiments, such as CPU or GPU models, or memory specifications. |
| Software Dependencies | Yes | RDKit s Run Reactants function (version 2019.03.1) |
| Experiment Setup | Yes | The f network uses four fully connected layers with 256, 128, 128 neurons in the hidden layers. The network uses four fully connected layers with 256, 256, 167 neurons in the hidden layers. All the hidden layers use Re LU activation whereas the final layer uses tanh activation. Similarly, the Q network also uses four fully connected layers with 256, 64, 16 neurons in the hidden layers, with Re LU activation for all the hidden layers and linear activation for the final layer. We use the Adam optimizer to train all the networks with a learning rate of 1e-4 for the f and networks and 3e-4 for the Q network. Further, we used a discount factor γ = 0.99, mini batch size = 32, and soft update weight for target networks, = 0.005. |