Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards 3D Molecule-Text Interpretation in Language Models
Authors: Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive experiments, including molecule-text retrieval, molecule captioning, and open-text molecular QA tasks, to demonstrate the effectiveness of 3D-Mo LM for 3D molecule-text interpretation. |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China 2National University of Singapore 3Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, USTC 4Institute of Dataspace, Hefei Comprehensive National Science Center 5Huawei Cloud |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our codes and datasets at https://github.com/lsh0520/3D-Mo LM. |
| Open Datasets | Yes | We release our codes and datasets at https://github.com/lsh0520/3D-Mo LM. |
| Dataset Splits | Yes | This curated subset is subsequently partitioned into train / validation / test sets containing 12K / 1K / 2K pairs, respectively. |
| Hardware Specification | Yes | The computation overhead is 40 GPU hours on NVIDIA A100 with BFloat16 Mixed precision. |
| Software Dependencies | No | The paper mentions software like RDKit, GPT-3.5, GPT-4, and Llama2, but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | The Q-former attached with a frozen 3D molecular encoder is pertrained for 50 epochs and the number of query tokens in it is set to 8. Adam W (Loshchilov & Hutter, 2018) optimizer is adopted with a weight decay of 0.05 and a learning rate scheduler of a combination of linear warmup with 1000 steps and cosine decay, in which the peak and minimal learning rates are 1e-4 and 5e-6, respectively. And the batch size and maximal text length are 64 and 256, respectively. |