reproducibilityindex.ai

Data-Efficient Graph Grammar Learning for Molecular Generation

Authors: Minghao Guo, Veronika Thost, Beichen Li, Payel Das, Jie Chen, Wojciech Matusik

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation investigates the following ﬁve questions: How do SOTA models for molecule generation perform on realistic small monomer datasets? Is our approach effective in generating speciﬁc types of monomers that are synthesizable? How do the models perform on larger monomer datasets? Can our approach learn to weigh and optimize different metrics according to user needs? Can our grammar s explainability support applications, such as functional group extraction? 5.1 EXPERIMENT SETUP Data. We use three small datasets, each representing a speciﬁc class of monomers, which we curate manually from the literature: Acrylates, Chain Extenders, and Isocyanates, containing only 32, 11, and 11 samples, respectively (printed in Appendix G). 5.2 RESULTS ON SMALL, CLASS-SPECIFIC POLYMER DATA Results. Table 1 shows the results on the Isocyanate data; due to lack of space the other two tables are in Appendix C.1.
Researcher Affiliation	Collaboration	1MIT CSAIL, 2MIT-IBM Watson AI Lab, 3IBM Research
Pseudocode	No	The paper describes the overall pipeline and grammar construction process using figures and textual descriptions, but it does not include a formal pseudocode block or an algorithm section.
Open Source Code	Yes	Code is available at https://github.com/gmh14/data_efficient_grammar.
Open Datasets	Yes	Data. We use three small datasets, each representing a speciﬁc class of monomers, which we curate manually from the literature: Acrylates, Chain Extenders, and Isocyanates, containing only 32, 11, and 11 samples, respectively (printed in Appendix G). For comparison and for pretraining baselines, we also use a large collection of 81k monomers from St. John et al. (2019) and Jin et al. (2020).1 1https://github.com/wengong-jin/hgraph2graph
Dataset Splits	No	The paper specifies the total number of samples in the small datasets and states the number of training samples used for the large polymer dataset (117 or 239 samples), but it does not provide explicit train/validation/test splits (e.g., percentages, sample counts for each split, or a detailed splitting methodology) for all datasets used in the experiments. The evaluation for generated molecules refers to comparing them against the training data distribution rather than a held-out test set from the original data.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions using a 'pretrained graph neural network (Hu et al., 2019) as our feature extractor' and states that it uses the 'Adam optimizer' but does not provide specific version numbers for any software dependencies like Python, PyTorch, TensorFlow, or other libraries.
Experiment Setup	Yes	For the potential function Fθ, we use a two-layer fully connected network with size 300 and 128. For the optimization objectives, we consider two metrics: diversity and RS. For hyperparameters, we set MC sampling size as 5. We use the Adam optimizer to train the two-layer network with learning rate 0.01. We trained for 20 epochs.