reproducibilityindex.ai

MARS: Markov Molecular Sampling for Multi-objective Drug Discovery

Authors: Yutong Xie, Chence Shi, Hao Zhou, Yuwei Yang, Weinan Zhang, Yong Yu, Lei Li

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that MARS achieves state-of-the-art performance in various multi-objective settings where molecular bio-activity, drug-likeness, and synthesizability are considered.
Researcher Affiliation	Collaboration	Byte Dance AI Lab, Shanghai, China University of Michigan, Ann Arbor, MI, USA Montr eal Institute of Learning Algorithms, Montreal, Canada Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
Pseudocode	Yes	Algorithm 1: MARS
Open Source Code	Yes	The code is available at https://github.com/yutxie/mars.
Open Datasets	Yes	For the fragment vocabulary, we extract the top 1000 frequently appearing fragments that contain no more than 10 heavy atoms from the Ch EMBL database (Gaulton et al., 2017) by enumerating single bonds to break.
Dataset Splits	No	The paper describes an adaptive self-training strategy where the model is trained 'on-the-ﬂy' using collected samples, rather than specifying fixed training, validation, and test splits for a static dataset.
Hardware Specification	Yes	The computing server has two CPUs with 64 virtual cores (2.10GHz), 231G memory (about 50G used), and one Tesla V100 GPU with 32G memory.
Software Dependencies	No	The paper mentions using MPNNs and the Adam optimizer, but does not provide specific version numbers for any software libraries or frameworks used in the implementation.
Experiment Setup	Yes	For the sampling process, the unnormalized target distribution is set as π(x) = P k sk(x) where sk(x) is a scoring function for the above-mentioned properties of interests, the temperature is set as T = 0.95 t/5 and we sample N = 5000 molecules at one time. ... The MPNN model has six layers, and the node embedding size is d = 64. Moreover, for the model training, we use an Adam optimizer (Kingma & Ba, 2015) to update the model parameters with an initial learning rate set as 3 10 4, the maximum dataset size is limited as \|D\| 75, 000, and at each step, we update the model for no more than 25 times.