reproducibilityindex.ai

Learning Multimodal Graph-to-Graph Translation for Molecule Optimization

Authors: Wengong Jin, Kevin Yang, Regina Barzilay, Tommi Jaakkola

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines.
Researcher Affiliation	Academia	Wengong Jin, Kevin Yang, Regina Barzilay, Tommi Jaakkola Computer Science and Artiﬁcial Intelligence Lab, Massachusetts Institute of Technology {wengong, regina, tommi}@csail.mit.edu; yangk@mit.edu
Pseudocode	Yes	Algorithm 1 Adversarial Scaffold Regularization
Open Source Code	Yes	Code and data are available at https://github.com/wengong-jin/iclr19-graph2graph
Open Datasets	Yes	we extracted 99K and 79K translation pairs respectively from the ZINC dataset (Sterling & Irwin, 2015; Jin et al., 2018) for training. We extracted a training set of 88K molecule pairs with similarity constraint δ = 0.4. With similarity constraint δ = 0.4, we derived a training set of 34K molecular pairs from ZINC and the dataset collected by Olivecrona et al. (2017).
Dataset Splits	No	On the penalized log P task... We use their validation and test sets for evaluation. For each task, we ensured that all molecules in validation and test set had never appeared during training. While it mentions using validation sets from another source and ensuring test/validation sets are distinct from training, it doesn't provide specific split percentages, absolute counts, or a detailed splitting methodology within this paper for all datasets, especially not for the QED and DRD2 tasks where only train and test set sizes are given.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions software like 'Adam optimizer', 'RDKit' (with a citation but no version number), and 'Python' (implied), but it does not provide specific version numbers for key software dependencies (e.g., 'PyTorch 1.9', 'RDKit 2020.09.1').
Experiment Setup	Yes	For our models, the hidden state dimension is 300 and latent code dimension \|z\| = 8, and we set the KL regularization weight λKL = 1/\|z\|. For the VSeq2Seq model, the encoder is a one-layer bidirectional LSTM and the decoder is a one-layer LSTM with hidden state dimension 600. All models are trained with the Adam optimizer for 20 epochs with learning rate 0.001. We anneal the learning rate by 0.9 for every epoch. For adversarial training, our discriminator is a three-layer feed-forward network with hidden layer dimension 300 and Leaky Re LU activation function. The discriminator is trained for N = 5 iterations with gradient penalty weight β = 10.