Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Constrained Graph Variational Autoencoders for Molecule Design
Authors: Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, Alexander Gaunt
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments compare our approach with a wide range of baselines on the molecule generation task and show that our method is successful at matching the statistics of the original dataset on semantically important metrics. |
| Researcher Affiliation | Collaboration | Qi Liu 1, Miltiadis Allamanis2, Marc Brockschmidt2, and Alexander L. Gaunt2 1Singapore University of Technology and Design 2Microsoft Research, Cambridge |
| Pseudocode | No | The paper contains Figure 1 which illustrates the generative procedure as a diagram, but there are no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation of CGVAE can be found at https://github.com/Microsoft/ constrained-graph-variational-autoencoder. |
| Open Datasets | Yes | QM9 [26, 27], an enumeration of 134k stable organic molecules with up to 9 heavy atoms (carbon, oxygen, nitrogen and fluorine). ZINC dataset [12], a curated set of 250k commercially available drug-like chemical compounds. CEPDB [10, 11], a dataset of organic molecules with an emphasis on photo-voltaic applications. |
| Dataset Splits | No | The paper mentions training on datasets and sampling generated molecules, but does not specify train/validation/test dataset splits with percentages or counts for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions using 'RDKit' but does not provide version numbers for RDKit or any other software dependencies. |
| Experiment Setup | Yes | Our experiments use S = 7. In our implementation, Eℓis a dimension-preserving linear transformation. C and Lℓare fully connected networks with a single hidden layer of 200 units and Re LU non-linearities. In our experiments, both g1 and g2 are implemented as linear transformations that project to scalars. We allow deviation from the pure VAE loss (λ1 = 1) following Yeung et al. [34]. |