Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Authors: Seojin Kim, Jaehyun Nam, Sihyun Yu, Younghoon Shin, Jinwoo Shin
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of HIMol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-ofthe-art method with 50 less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction. |
| Researcher Affiliation | Academia | 1Korea Advanced Institute of Science and Technology (KAIST) 2Korea University. |
| Pseudocode | Yes | Algorithm 1 Modification algorithm for an invalid SMILES string |
| Open Source Code | Yes | Code is available at https: //github.com/Seojin-Kim/HI-Mol. |
| Open Datasets | Yes | We consider three datasets in the Molecule Net (Wu et al., 2018) benchmark (originally designed for activity detection): HIV, BBBP, and BACE |
| Dataset Splits | Yes | We utilize a common splitting scheme for Molecule Net dataset, scaffold split with split ratio of train:valid:test = 80:10:10 (Wu et al., 2018). |
| Hardware Specification | Yes | Our experiment is conducted for 1,000 epochs using a single NVIDIA Ge Force RTX 3090 GPU with a batch size of 4. |
| Software Dependencies | No | The paper mentions software like 'Mol T5-Large-Caption2Smiles', 'T5', and 'AdamW optimizer' but does not provide specific version numbers for any of these, nor for any programming languages or libraries used. |
| Experiment Setup | Yes | Our experiment is conducted for 1,000 epochs using a single NVIDIA Ge Force RTX 3090 GPU with a batch size of 4. We use Adam W optimizer with ϵ = 1.0 10 8 and let the learning rate 0.3 with linear scheduler. We clip gradients with the maximum norm of 1.0. |