Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models
Authors: Songtao Liu, Hanjun Dai, Yue Zhao, Peng Liu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2.5%. |
| Researcher Affiliation | Collaboration | 1The Pennsylvania State University 2Google DeepMind 3University of Southern California. Correspondence to: Songtao Liu <skl5761@psu.edu>. |
| Pseudocode | Yes | Algorithm 1 CREBM Framework |
| Open Source Code | Yes | Code is available at https:// github.com/Songtao Liu0823/CREBM. |
| Open Datasets | Yes | Dataset. We use the public dataset Retro Bench (Liu et al., 2023b) for evaluation. The target molecules associated with synthetic routes are split into training, validation, and test datasets in an 80%/10%/10% ratio. |
| Dataset Splits | Yes | The target molecules associated with synthetic routes are split into training, validation, and test datasets in an 80%/10%/10% ratio. We have 46,458 data points for training, 5,803 for validation, and 5,838 for testing. |
| Hardware Specification | Yes | All the experiments of baselines are conducted on a single NVIDIA Tesla A100 with 80GB memory size. |
| Software Dependencies | Yes | The softwares that we use for experiments are Python 3.6.8, CUDA 10.2.89, CUDNN 7.6.5, einops 0.4.1, pytorch 1.9.0, pytorch-scatter 2.0.9, pytorch-sparse 0.6.12, numpy 1.19.2, torchvision 0.10.0, and torchdrug 0.1.3. |
| Experiment Setup | Yes | We employ a standard Transformer (Vaswani et al., 2017) architecture to implement Eθ (T | mtar, c), with the target molecule serving as the input for the encoder and the starting material (right shift) as the input for the decoder. The output is the logits of the starting material (left shift) for computing Eθ. One thing we d like to point out is that Eθ is pretrained first on the target-to-starting material task, so we naturally deploy this for modeling, instead of training an encoder-only one from scratch. We also employ the standard Transformer architecture to implement the forward model, framing the task of predicting a product from starting materials as a sequence-to-sequence task. For constructing our preference dataset D, we sample 10 synthetic routes for each molecule in the training dataset. All the models in the work are trained on the NVIDIA Tesla A100 GPU. The tables in 'D.2. Hyperparameter Details' provide concrete values for max length, embedding size, encoder layers, decoder layers, attention heads, FFN hidden, dropout, epochs, batch size, warmup, and learning rate. |