reproducibilityindex.ai

Data-Efficient Molecular Generation with Hierarchical Textual Inversion

Authors: Seojin Kim, Jaehyun Nam, Sihyun Yu, Younghoon Shin, Jinwoo Shin

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the superiority of HIMol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-ofthe-art method with 50 less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction.
Researcher Affiliation	Academia	1Korea Advanced Institute of Science and Technology (KAIST) 2Korea University.
Pseudocode	Yes	Algorithm 1 Modification algorithm for an invalid SMILES string
Open Source Code	Yes	Code is available at https: //github.com/Seojin-Kim/HI-Mol.
Open Datasets	Yes	We consider three datasets in the Molecule Net (Wu et al., 2018) benchmark (originally designed for activity detection): HIV, BBBP, and BACE
Dataset Splits	Yes	We utilize a common splitting scheme for Molecule Net dataset, scaffold split with split ratio of train:valid:test = 80:10:10 (Wu et al., 2018).
Hardware Specification	Yes	Our experiment is conducted for 1,000 epochs using a single NVIDIA Ge Force RTX 3090 GPU with a batch size of 4.
Software Dependencies	No	The paper mentions software like 'Mol T5-Large-Caption2Smiles', 'T5', and 'AdamW optimizer' but does not provide specific version numbers for any of these, nor for any programming languages or libraries used.
Experiment Setup	Yes	Our experiment is conducted for 1,000 epochs using a single NVIDIA Ge Force RTX 3090 GPU with a batch size of 4. We use Adam W optimizer with ϵ = 1.0 10 8 and let the learning rate 0.3 with linear scheduler. We clip gradients with the maximum norm of 1.0.