Molecular Optimization Model with Patentability Constraint
Authors: Sally Turutov, Kira Radinsky
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical evaluation, we demonstrate the superior performance of our approach compared to state-of-the-art molecular optimization methods both in chemical property optimization and patentability. We empirically evaluate our proposed model on numerous molecule optimization tasks, demonstrating its ability to maintain similarity and optimize properties while considering patent constraints. Our results show that our model successfully reduces the similarity of optimized molecules to existing patents while still generating highly optimized molecules, thus outperforming the state-of-the-art (SOTA) models. Additionally, comprehensive ablation experiments provide detailed insights into the effectiveness of our approach and its individual components. |
| Researcher Affiliation | Academia | Sally Turutov, Kira Radinsky Technion Israel Institute of Technology turutovsally@campus.technion.ac.il, kirar@cs.technion.ac.il |
| Pseudocode | Yes | Algorithm 1 METN Training Algorithm, Algorithm 2 EETN Training Algorithm, Algorithm 3 Extended-EETN Training Algorithm, Algorithm 4 End-to-End Training Algorithm |
| Open Source Code | Yes | To facilitate further research and exploration of the problem, we provide the community with access to our code and data through the following link: https://github.com/Sally Turutov/MOMP. |
| Open Datasets | Yes | We utilized datasets from (Jin et al. 2019). ... The Sure Ch EMBL dataset (Papadatos et al. 2016) focuses on patent compounds, providing Maximum Common Substructures (MCSs) representing shared core chemical structures within a patent. |
| Dataset Splits | No | The paper mentions using 'training sets of molecules' and partitioning data into domain-specific sets (A, B, C), and using 'original datasets for training and testing' for baseline models. However, it does not provide specific percentages, counts, or explicit instructions for train/validation/test splits for its own experimental setup with the A, B, C domains. |
| Hardware Specification | No | No specific hardware specifications (e.g., GPU models, CPU types, or memory details) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer but does not specify version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | We employ the Adam optimizer with a learning rate of 3 10 4, a mini-batch size of 32, and set maximum epochs Emax T rain to 12 for QED and 18 for DRD2. The regularization parameters are λAB = λBC = λAC = 2. |