reproducibilityindex.ai

MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

Authors: Tianfan Fu, Cao Xiao, Xinhao Li, Lucas M. Glass, Jimeng Sun125-133

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare MIMOSA with state-of-the-art baselines on optimizing several important properties across multiple settings, MIMOSA achieves 43.7% success rate (49.1% relative improvement over the best baseline GA (Nigam et al. 2020)) when optimizing DRD and PLog P jointly.
Researcher Affiliation	Collaboration	Tianfan Fu,1 College of Computing, Georgia Institute of Technology 2Analytics Center of Excellence, IQVIA 3Department of Chemistry, North Carolina State University 4Department of Statistics, Temple University 5Department of Computer Science, University of Illinois, Urbana-Champaign
Pseudocode	Yes	Algorithm 1 MIMOSA for Molecule Optimization
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the MIMOSA methodology. It mentions 'RDKit (https://www.rdkit.org/)' which is a third-party tool.
Open Datasets	Yes	We use 2 million molecules from ZINC database (Sterling and Irwin 2015; Hu et al. 2019) to train both m GNN and b GNN.
Dataset Splits	No	The paper mentions using the ZINC database for training but does not provide specific details on dataset splits (percentages, sample counts, or predefined splits) for validation or training within the provided text. It refers to 'Details on Implementation, Features, Dataset Construction, Evaluation Strategies are in Fu et al. (2020b)' which points to this very paper's arXiv preprint, but these details are not present in the provided excerpt.
Hardware Specification	No	The paper states, 'Empirically, this entire sampling process takes about 10-20 minutes for optimizing one source molecule', but it does not provide any specific hardware details such as GPU/CPU models, memory, or cloud resources used for the experiments.
Software Dependencies	No	The paper mentions 'RDkit package (https://www.rdkit.org/)' and references 'well-trained model (Jin et al. 2019; Fu, Xiao, and Sun 2020; Fu et al. 2020a)' but does not provide specific version numbers for these software components.
Experiment Setup	No	The paper states 'The parameter setting of these methods are provided in Fu et al. (2020b)' and 'Details on Implementation, Features, Dataset Construction, Evaluation Strategies are in Fu et al. (2020b)'. This refers to this very paper's arXiv preprint. However, the provided text does not contain concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or system-level training settings within the main body.