Learning Invariant Molecular Representation in Latent Discrete Space

Authors: Xiang Zhuang, Qiang Zhang, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, Huajun Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
Researcher Affiliation Collaboration Xiang Zhuang1,2,3 , Qiang Zhang1,2,3 , Keyan Ding2, Yatao Bian4, Xiao Wang5, Jingsong Lv6, Hongyang Chen6, Huajun Chen1,2,3 1College of Computer Science and Technology, Zhejiang University 2ZJU-Hangzhou Global Scientific and Technological Innovation Center 3Zhejiang University Ant Group Joint Laboratory of Knowledge Graph 4Tencent AI Lab, 5School of Software, Beihang University, 6Zhejiang Lab
Pseudocode No The paper describes the methodology in detail using mathematical equations and textual explanations (e.g., Section 4 'Method'), but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code is available at https://github.com/HICAI-ZJU/i Mo LD.
Open Datasets Yes We employ two real-world benchmarks for OOD molecular representation learning. Details of datasets are in Appendix A. GOOD [63], which is a systematic benchmark tailored specifically for graph OOD problems. ... Drug OOD [13], which is a OOD benchmark for AI-aided drug discovery. ... We use the latest data released on the official webpage3 based on the Ch EMBL 30 database4. ... 3https://drugood.github.io/ 4http://ftp.ebi.ac.uk/pub/databases/chembl/Ch EMBLdb/releases/chembl_30
Dataset Splits Yes We use the default dataset split proposed in each benchmark. For covariate shift, the training, validation and testing sets are obtained based on environments without interactions. For concept shift, a screening approach is leveraged to scan and select molecules in the dataset. Statistics of each dataset are in Table 4.
Hardware Specification Yes Experiments are conducted on one 24GB NVIDIA RTX 3090 GPU.
Software Dependencies No We implement the proposed i Mo LD in Pytorch [69] and Py G [70]. The paper refers to the frameworks used but does not specify their version numbers.
Experiment Setup Yes For all the datasets, we select hyper-parameters by ranging the code book size |C| from {100, 500, 1000, 4000, 10000}, threshold γ from {0.1, 0.5, 0.7, 0.9}, λ1 from {0.001, 0.01, 0.1, 0.5}, λ2 from {0.01, 0.1, 0.5, 1}, λ3 from {0.01, 0.1, 0.3, 0.5, 1}, and batch size from {32, 64, 128, 256, 512}. For datasets in Drug OOD, we also select dropout rate from {0.1, 0.3, 0.5}. The maximum number of epochs is set to 200 and the learning rate is set to 0.001. Please refer to Table 6 for a detailed hyper-parameter configuration of various datasets.