Conversational Drug Editing Using Retrieval and Domain Feedback

Authors: Shengchao Liu, Jiongxiao Wang, Yijin Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, Chaowei Xiao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that Chat Drug reaches the best performance on all 39 drug editing tasks, encompassing small molecules, peptides, and proteins. We further demonstrate, through 10 case studies, that Chat Drug can successfully identify the key substructures for manipulation, generating diverse and valid suggestions for drug editing. Quantitatively, Chat Drug reaches the best performance on all 39 drug editing tasks compared to seven baselines, among which, Chat Drug-Turbo reaches generally better performance and higher stability on 32 of them.
Researcher Affiliation Academia 1University of California, Berkeley 2University of Wisconsin-Madison 3Arizona State University 4University of Illinois Urbana-Champaign 5Princeton University 6National Research Council Canada 7University of Ottawa
Pseudocode No The paper does not contain a pseudocode block or algorithm section.
Open Source Code Yes The codes and datasets can be found at this Git Hub link.
Open Datasets Yes Data: Both the input molecules and retrieval DB are sampled from ZINC (Irwin et al., 2012): we sample 200 and 10K molecules (with SMILES strings) from ZINC as input molecules and retrieval DB, respectively. In this experiment, we use the experimental dataset of peptide-MHC binding affinities (O Donnell et al., 2020). TAPE (Rao et al., 2019) is a benchmark for protein sequence property prediction, including the secondary structure prediction task.
Dataset Splits No The paper describes how data is used (e.g., 'sample 200 and 10K molecules... as input molecules and retrieval DB', 'take the test dataset and training dataset as the input proteins and retrieval DB respectively'), but it does not provide explicit training/validation/test splits (e.g., percentages or counts for distinct sets beyond 'input' and 'retrieval DB') defined by the paper itself. The framework is parameter-free, so it does not have a traditional training phase requiring explicit splits for model training.
Hardware Specification Yes All of our experiments for Chat Drug-Turbo are conducted on a single NVIDIA RTX A5000 GPU. For open source LLMs backbones, Chat Drug-GALACTICA and Chat Drug-Llama2 need at least 2 NVIDIA RTX A5000 GPUs for small molecule editing and peptide editing. For protein editing tasks, due to an extra usage of GPU for protein evaluation, 4 NVIDIA RTX A5000 GPUs are needed in our experiments.
Software Dependencies No The paper mentions specific LLM models ('gpt-3.5-turbo-0301', 'facebook/galactica-6.7b', 'meta-llama/Llama-2-7b-chat-hf') and tools/libraries ('RDKit', 'MHCflurry2.0', 'Protein CLAP-EBM-NCE from Protein DT', 'ESMFold', 'Py MOL') but does not specify their version numbers for reproducibility.
Experiment Setup Yes We also set the temperature to 0 to reduce the potential randomness in our experiments. Additionally, we observe that Chat GPT often generates repeated sequences or fails to stop generating sequences for chemistry-related questions. To mitigate this issue, we set the frequency_penalty to 0.2. Moreover, for improved adaptation to different domains, it is advisable to incorporate a system role prompt within Chat GPT. In our case, we utilize the following prompt: 'You are an expert in the field of molecular chemistry.' The threshold for each small molecule editing task is shown in Table 19, which holds for both functions. The experiment results of the PDDS Module are entirely deterministic. Any randomness observed in Re DF Module and Conversation Module is due to the utilization of different seeds during the sampling of retrieval database DB from ZINC for molecule editing. Specifically, for small molecule editing, we adopt seed 0,1,2,3,4 for main results in Tables 1 and 2, and seed 0 for the other ablation studies.