Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Data-Efficient Molecular Generation with Hierarchical Textual Inversion
Authors: Seojin Kim, Jaehyun Nam, Sihyun Yu, Younghoon Shin, Jinwoo Shin
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of HIMol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-ofthe-art method with 50 less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction. |
| Researcher Affiliation | Academia | 1Korea Advanced Institute of Science and Technology (KAIST) 2Korea University. |
| Pseudocode | Yes | Algorithm 1 Modification algorithm for an invalid SMILES string |
| Open Source Code | Yes | Code is available at https: //github.com/Seojin-Kim/HI-Mol. |
| Open Datasets | Yes | We consider three datasets in the Molecule Net (Wu et al., 2018) benchmark (originally designed for activity detection): HIV, BBBP, and BACE |
| Dataset Splits | Yes | We utilize a common splitting scheme for Molecule Net dataset, scaffold split with split ratio of train:valid:test = 80:10:10 (Wu et al., 2018). |
| Hardware Specification | Yes | Our experiment is conducted for 1,000 epochs using a single NVIDIA Ge Force RTX 3090 GPU with a batch size of 4. |
| Software Dependencies | No | The paper mentions software like 'Mol T5-Large-Caption2Smiles', 'T5', and 'AdamW optimizer' but does not provide specific version numbers for any of these, nor for any programming languages or libraries used. |
| Experiment Setup | Yes | Our experiment is conducted for 1,000 epochs using a single NVIDIA Ge Force RTX 3090 GPU with a batch size of 4. We use Adam W optimizer with ϵ = 1.0 10 8 and let the learning rate 0.3 with linear scheduler. We clip gradients with the maximum norm of 1.0. |