Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
LLaMo: Large Language Model-based Molecular Graph Assistant
Authors: Jinyoung Park, Minseong Bae, Dohwan Ko, Hyunwoo J. Kim
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate that LLa Mo shows the best performance on diverse tasks, such as molecular description generation, property prediction, and IUPAC name prediction. |
| Researcher Affiliation | Academia | Jinyoung Park Minseong Bae Dohwan Ko Hyunwoo J. Kim Department of Computer Science and Engineering, Korea University EMAIL |
| Pseudocode | No | The paper describes methods in prose and figures but does not include pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code of LLa Mo is available at https://github.com/mlvlab/LLa Mo. |
| Open Datasets | Yes | For molecule description generation , and property prediction , we use the datasets derived from Pub Chem and QM9 of Molecule Net [64] as in Mol-Instructions [48]. For IUPAC name prediction, a dataset derived from [3] is used. To train the generalist variant of LLa Mo, we use a training split of molecular description generation dataset of Mol-Instructions in stage 1. In stage 2, the model is instruction-tuned with a training split of description generation and property prediction instruction dataset of Mol-Instructions, IUPAC name prediction from [3], and our GPT-generated instruction-following data. [...] Pub Chem324k is constructed by collecting 324k molecules and their associated text information from the Pub Chem database. Ch EBI-20 is the most commonly utilized benchmark in this task, consisting of selected 33,010 pairs of molecules and descriptions from Ch EBI [72]. |
| Dataset Splits | Yes | To train the generalist variant of LLa Mo, we use a training split of molecular description generation dataset of Mol-Instruction [48] in stage 1. In stage 2, the model is instruction-tuned with a training split of description generation and property prediction instruction dataset of Mol-Instructions, IUPAC name prediction from [3], and our GPT-generated instruction-following data. For the evaluation of molecular description generation and property question answering tasks, we use the test split of Mol-Instructions molecular description generation and property prediction datasets, which are sampled from Pub Chem [44] and QM9 dataset of Molecule Net [64], respectively. |
| Hardware Specification | Yes | Our experiments are run on 4 A6000 GPUs or 4 V100 GPUs and 2 A6000 GPUs for LLa MA2 and Galactica, respectively. |
| Software Dependencies | No | The paper mentions software like PyTorch, PyTorch Geometric, Huggingface transformers, PEFT, and Open Delta, but does not specify their version numbers. |
| Experiment Setup | Yes | In stage 1, the Adam W [63] optimizer is adapted with an initial learning rate of 1e-4 (minimum learning rate is 1e-5 and warmup learning rate is 1e-6). The warmup step is 1,000 and the cosine scheduler is applied. In stage 2, the initial learning rate is set to 5e-5 (minimum learning rate is 5e-6 and warmup learning rate is 5e-7). [...] We use Lo RA to train the large language model in stage 2. |