Towards 3D Molecule-Text Interpretation in Language Models

Authors: Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct extensive experiments, including molecule-text retrieval, molecule captioning, and open-text molecular QA tasks, to demonstrate the effectiveness of 3D-Mo LM for 3D molecule-text interpretation.
Researcher Affiliation Collaboration 1University of Science and Technology of China 2National University of Singapore 3Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, USTC 4Institute of Dataspace, Hefei Comprehensive National Science Center 5Huawei Cloud
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We release our codes and datasets at https://github.com/lsh0520/3D-Mo LM.
Open Datasets Yes We release our codes and datasets at https://github.com/lsh0520/3D-Mo LM.
Dataset Splits Yes This curated subset is subsequently partitioned into train / validation / test sets containing 12K / 1K / 2K pairs, respectively.
Hardware Specification Yes The computation overhead is 40 GPU hours on NVIDIA A100 with BFloat16 Mixed precision.
Software Dependencies No The paper mentions software like RDKit, GPT-3.5, GPT-4, and Llama2, but does not specify their version numbers for reproducibility.
Experiment Setup Yes The Q-former attached with a frozen 3D molecular encoder is pertrained for 50 epochs and the number of query tokens in it is set to 8. Adam W (Loshchilov & Hutter, 2018) optimizer is adopted with a weight decay of 0.05 and a learning rate scheduler of a combination of linear warmup with 1000 steps and cosine decay, in which the peak and minimal learning rates are 1e-4 and 5e-6, respectively. And the batch size and maximal text length are 64 and 256, respectively.