Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Molecular Hypergraph Grammar with Its Application to Molecular Optimization
Authors: Hiroshi Kajino
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the effectiveness of MHG in the molecular optimization domain. ... We use the ZINC dataset following the existing work. ... Table 1. Reconstruction rate, predictive performance, and global molecular optimization with the unlimited oracle. |
| Researcher Affiliation | Industry | 1MIT-IBM Watson AI Lab; IBM Research, Tokyo, Japan. Correspondence to: Hiroshi Kajino <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Latent Representation Inference In: Mols and targets, G0 = {gn}N n=1, Y0 = {yn}N n=1. ... Algorithm 2 Global Molecular Optimization In: Z0, Y0, Dec, #iterations K, #candidates M. |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | We use the ZINC dataset following the existing work. This dataset is extracted from the ZINC database (Irwin et al., 2012) and contains 220,011 molecules for training, 24,445 for validation, and 5,000 for testing. |
| Dataset Splits | Yes | We use the ZINC dataset following the existing work. This dataset is extracted from the ZINC database (Irwin et al., 2012) and contains 220,011 molecules for training, 24,445 for validation, and 5,000 for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only vaguely mentions 'our environment' in relation to memory consumption for a baseline. |
| Software Dependencies | No | The paper mentions using 'GPy Opt (The GPy Opt authors, 2016)' but does not provide specific version numbers for software dependencies needed for replication. |
| Experiment Setup | Yes | For our method, we ο¬rst obtain latent representations by Algorithm 1. Then, we apply PCA to the latent vectors to obtain 40-dimensional latent representations. Then, we run Algorithm 2 with M = 50, K = 5. ... For our method and JT-VAE, we initialize GP with N = 250 labeled molecules randomly selected from the training set, and run Algorithm 2 with M = 1, K = 250. |