MARS: Markov Molecular Sampling for Multi-objective Drug Discovery

Authors: Yutong Xie, Chence Shi, Hao Zhou, Yuwei Yang, Weinan Zhang, Yong Yu, Lei Li

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that MARS achieves state-of-the-art performance in various multi-objective settings where molecular bio-activity, drug-likeness, and synthesizability are considered.
Researcher Affiliation Collaboration Byte Dance AI Lab, Shanghai, China University of Michigan, Ann Arbor, MI, USA Montr eal Institute of Learning Algorithms, Montreal, Canada Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
Pseudocode Yes Algorithm 1: MARS
Open Source Code Yes The code is available at https://github.com/yutxie/mars.
Open Datasets Yes For the fragment vocabulary, we extract the top 1000 frequently appearing fragments that contain no more than 10 heavy atoms from the Ch EMBL database (Gaulton et al., 2017) by enumerating single bonds to break.
Dataset Splits No The paper describes an adaptive self-training strategy where the model is trained 'on-the-fly' using collected samples, rather than specifying fixed training, validation, and test splits for a static dataset.
Hardware Specification Yes The computing server has two CPUs with 64 virtual cores (2.10GHz), 231G memory (about 50G used), and one Tesla V100 GPU with 32G memory.
Software Dependencies No The paper mentions using MPNNs and the Adam optimizer, but does not provide specific version numbers for any software libraries or frameworks used in the implementation.
Experiment Setup Yes For the sampling process, the unnormalized target distribution is set as π(x) = P k sk(x) where sk(x) is a scoring function for the above-mentioned properties of interests, the temperature is set as T = 0.95 t/5 and we sample N = 5000 molecules at one time. ... The MPNN model has six layers, and the node embedding size is d = 64. Moreover, for the model training, we use an Adam optimizer (Kingma & Ba, 2015) to update the model parameters with an initial learning rate set as 3 10 4, the maximum dataset size is limited as |D| 75, 000, and at each step, we update the model for no more than 25 times.