Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
CORE: Automatic Molecule Optimization Using Copy & Refine Strategy
Authors: Tianfan Fu, Cao Xiao, Jimeng Sun638-645
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested CORE and baselines using the ZINC database and CORE obtained up to 11% and 21% relatively improvement over the baselines on success rate on the complete test set and the subset with infrequent substructures, respectively. |
| Researcher Affiliation | Collaboration | Tianfan Fu,1 Cao Xiao,2 Jimeng Sun1 1College of Computing, Georgia Institute of Technology, Atlanta, USA 2Analytics Center of Excellence, IQVIA, Cambridge, USA EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available3. 3https://github.com/futianfan/CORE |
| Open Datasets | Yes | First, we introduce the molecule data that we are using. ZINC contains 250K drug molecules extracted from the ZINC database (Sterling and Irwin 2015). |
| Dataset Splits | Yes | Table 3: Statistics of 4 datasets, DRD2, QED, Log P04 and Log P06. ... Dataset # Training Pairs # Valid Pairs # Test ( 20) |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only discusses general aspects of the experimental setup. |
| Software Dependencies | No | The paper mentions software components like “Adam optimizer” and “multi-layer feedforward network” but does not specify any software or library names with version numbers required for replication. |
| Experiment Setup | Yes | In this section, we provide the implementation details for reproducibility, especially the setting of hyperparameters. We follow most of the hyperparameter setting of (Jin et al. 2019). For all these baseline methods and datasets, maximal epoch number is set to 10, batch size is set to 32. During encoder module, embedding size is set to 300. The depth of message passing network are set to 6 and 3 for tree and graph, respectively. The initial learning rate is set to 1e 3 with the Adam optimizer. Every epoch learning rate is annealed by 0.8. |