Learning Invariant Molecular Representation in Latent Discrete Space
Authors: Xiang Zhuang, Qiang Zhang, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, Huajun Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. |
| Researcher Affiliation | Collaboration | Xiang Zhuang1,2,3 , Qiang Zhang1,2,3 , Keyan Ding2, Yatao Bian4, Xiao Wang5, Jingsong Lv6, Hongyang Chen6, Huajun Chen1,2,3 1College of Computer Science and Technology, Zhejiang University 2ZJU-Hangzhou Global Scientific and Technological Innovation Center 3Zhejiang University Ant Group Joint Laboratory of Knowledge Graph 4Tencent AI Lab, 5School of Software, Beihang University, 6Zhejiang Lab |
| Pseudocode | No | The paper describes the methodology in detail using mathematical equations and textual explanations (e.g., Section 4 'Method'), but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/HICAI-ZJU/i Mo LD. |
| Open Datasets | Yes | We employ two real-world benchmarks for OOD molecular representation learning. Details of datasets are in Appendix A. GOOD [63], which is a systematic benchmark tailored specifically for graph OOD problems. ... Drug OOD [13], which is a OOD benchmark for AI-aided drug discovery. ... We use the latest data released on the official webpage3 based on the Ch EMBL 30 database4. ... 3https://drugood.github.io/ 4http://ftp.ebi.ac.uk/pub/databases/chembl/Ch EMBLdb/releases/chembl_30 |
| Dataset Splits | Yes | We use the default dataset split proposed in each benchmark. For covariate shift, the training, validation and testing sets are obtained based on environments without interactions. For concept shift, a screening approach is leveraged to scan and select molecules in the dataset. Statistics of each dataset are in Table 4. |
| Hardware Specification | Yes | Experiments are conducted on one 24GB NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | We implement the proposed i Mo LD in Pytorch [69] and Py G [70]. The paper refers to the frameworks used but does not specify their version numbers. |
| Experiment Setup | Yes | For all the datasets, we select hyper-parameters by ranging the code book size |C| from {100, 500, 1000, 4000, 10000}, threshold γ from {0.1, 0.5, 0.7, 0.9}, λ1 from {0.001, 0.01, 0.1, 0.5}, λ2 from {0.01, 0.1, 0.5, 1}, λ3 from {0.01, 0.1, 0.3, 0.5, 1}, and batch size from {32, 64, 128, 256, 512}. For datasets in Drug OOD, we also select dropout rate from {0.1, 0.3, 0.5}. The maximum number of epochs is set to 200 and the learning rate is set to 0.001. Please refer to Table 6 for a detailed hyper-parameter configuration of various datasets. |