Data-Efficient Molecular Generation with Hierarchical Textual Inversion

Authors: Seojin Kim, Jaehyun Nam, Sihyun Yu, Younghoon Shin, Jinwoo Shin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superiority of HIMol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-ofthe-art method with 50 less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction.
Researcher Affiliation Academia 1Korea Advanced Institute of Science and Technology (KAIST) 2Korea University.
Pseudocode Yes Algorithm 1 Modification algorithm for an invalid SMILES string
Open Source Code Yes Code is available at https: //github.com/Seojin-Kim/HI-Mol.
Open Datasets Yes We consider three datasets in the Molecule Net (Wu et al., 2018) benchmark (originally designed for activity detection): HIV, BBBP, and BACE
Dataset Splits Yes We utilize a common splitting scheme for Molecule Net dataset, scaffold split with split ratio of train:valid:test = 80:10:10 (Wu et al., 2018).
Hardware Specification Yes Our experiment is conducted for 1,000 epochs using a single NVIDIA Ge Force RTX 3090 GPU with a batch size of 4.
Software Dependencies No The paper mentions software like 'Mol T5-Large-Caption2Smiles', 'T5', and 'AdamW optimizer' but does not provide specific version numbers for any of these, nor for any programming languages or libraries used.
Experiment Setup Yes Our experiment is conducted for 1,000 epochs using a single NVIDIA Ge Force RTX 3090 GPU with a batch size of 4. We use Adam W optimizer with ϵ = 1.0 10 8 and let the learning rate 0.3 with linear scheduler. We clip gradients with the maximum norm of 1.0.