MARS: Markov Molecular Sampling for Multi-objective Drug Discovery
Authors: Yutong Xie, Chence Shi, Hao Zhou, Yuwei Yang, Weinan Zhang, Yong Yu, Lei Li
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that MARS achieves state-of-the-art performance in various multi-objective settings where molecular bio-activity, drug-likeness, and synthesizability are considered. |
| Researcher Affiliation | Collaboration | Byte Dance AI Lab, Shanghai, China University of Michigan, Ann Arbor, MI, USA Montr eal Institute of Learning Algorithms, Montreal, Canada Department of Computer Science and Engineering, Shanghai Jiao Tong University, China |
| Pseudocode | Yes | Algorithm 1: MARS |
| Open Source Code | Yes | The code is available at https://github.com/yutxie/mars. |
| Open Datasets | Yes | For the fragment vocabulary, we extract the top 1000 frequently appearing fragments that contain no more than 10 heavy atoms from the Ch EMBL database (Gaulton et al., 2017) by enumerating single bonds to break. |
| Dataset Splits | No | The paper describes an adaptive self-training strategy where the model is trained 'on-the-fly' using collected samples, rather than specifying fixed training, validation, and test splits for a static dataset. |
| Hardware Specification | Yes | The computing server has two CPUs with 64 virtual cores (2.10GHz), 231G memory (about 50G used), and one Tesla V100 GPU with 32G memory. |
| Software Dependencies | No | The paper mentions using MPNNs and the Adam optimizer, but does not provide specific version numbers for any software libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | For the sampling process, the unnormalized target distribution is set as π(x) = P k sk(x) where sk(x) is a scoring function for the above-mentioned properties of interests, the temperature is set as T = 0.95 t/5 and we sample N = 5000 molecules at one time. ... The MPNN model has six layers, and the node embedding size is d = 64. Moreover, for the model training, we use an Adam optimizer (Kingma & Ba, 2015) to update the model parameters with an initial learning rate set as 3 10 4, the maximum dataset size is limited as |D| 75, 000, and at each step, we update the model for no more than 25 times. |