Searching for High-Value Molecules Using Reinforcement Learning and Transformers

Authors: Raj Ghugare, Santiago Miret, Adriana Hugessen, Mariano Phielipp, Glen Berseth

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy s ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (Chem RLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations.
Researcher Affiliation Collaboration Raj Ghugare1,2 Santiago Miret 3 Adriana Hugessen1,2 Mariano Phielipp3 Glen Berseth1,2 1Universit e de Montr eal 2Mila Quebec AI Institute 3Intel Labs
Pseudocode No The paper describes algorithmic details (e.g., policy gradient algorithm, reward functions) but does not include any explicit pseudocode blocks or algorithms labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No Upon acceptance, we will open-source our code and release the pretrained weights to support reproducible research.
Open Datasets Yes To advance effectively within this vast search space, we make use of datasets containing a large number of drug-like molecules in text format (Irwin et al., 2012; Sterling and Irwin, 2015b; Mendez et al., 2019).
Dataset Splits No The paper mentions 'validation loss' ('On the ZINC 250K SMILES dataset, the FC, the RNN and the transformer model achieved a validation loss of 29.417, 22.507, and 22.923 respectively.') but does not specify the explicit proportions or sizes of the training, validation, or test dataset splits.
Hardware Specification Yes All transformers were trained for 5 epochs, with the largest batch size that we could fit in the memory of a single NVIDIA RTX A6000 GPU, for example, a batch size of 2048 for pretraining the transformer on ZINC 100M dataset.
Software Dependencies Yes For experiments that apply SELFIES, we convert all datasets to SELFIES using the Python API provided by (Krenn et al., 2020) (Version: 2.1.1).
Experiment Setup Yes All models used an initial learning rate of 1e 3, with a cosine learning rate schedule (Loshchilov and Hutter, 2017). FC and RNNs used a batch size of 128 and were trained for 10 epochs. All transformers were trained for 5 epochs, with the largest batch size that we could fit in the memory of a single NVIDIA RTX A6000 GPU, for example, a batch size of 2048 for pretraining the transformer on ZINC 100M dataset.