Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Invariant Molecular Representation in Latent Discrete Space
Authors: Xiang Zhuang, Qiang Zhang, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, Huajun Chen
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. |
| Researcher Affiliation | Collaboration | Xiang Zhuang1,2,3 , Qiang Zhang1,2,3 , Keyan Ding2, Yatao Bian4, Xiao Wang5, Jingsong Lv6, Hongyang Chen6, Huajun Chen1,2,3 1College of Computer Science and Technology, Zhejiang University 2ZJU-Hangzhou Global Scientific and Technological Innovation Center 3Zhejiang University Ant Group Joint Laboratory of Knowledge Graph 4Tencent AI Lab, 5School of Software, Beihang University, 6Zhejiang Lab |
| Pseudocode | No | The paper describes the methodology in detail using mathematical equations and textual explanations (e.g., Section 4 'Method'), but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/HICAI-ZJU/i Mo LD. |
| Open Datasets | Yes | We employ two real-world benchmarks for OOD molecular representation learning. Details of datasets are in Appendix A. GOOD [63], which is a systematic benchmark tailored specifically for graph OOD problems. ... Drug OOD [13], which is a OOD benchmark for AI-aided drug discovery. ... We use the latest data released on the official webpage3 based on the Ch EMBL 30 database4. ... 3https://drugood.github.io/ 4http://ftp.ebi.ac.uk/pub/databases/chembl/Ch EMBLdb/releases/chembl_30 |
| Dataset Splits | Yes | We use the default dataset split proposed in each benchmark. For covariate shift, the training, validation and testing sets are obtained based on environments without interactions. For concept shift, a screening approach is leveraged to scan and select molecules in the dataset. Statistics of each dataset are in Table 4. |
| Hardware Specification | Yes | Experiments are conducted on one 24GB NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | We implement the proposed i Mo LD in Pytorch [69] and Py G [70]. The paper refers to the frameworks used but does not specify their version numbers. |
| Experiment Setup | Yes | For all the datasets, we select hyper-parameters by ranging the code book size |C| from {100, 500, 1000, 4000, 10000}, threshold γ from {0.1, 0.5, 0.7, 0.9}, λ1 from {0.001, 0.01, 0.1, 0.5}, λ2 from {0.01, 0.1, 0.5, 1}, λ3 from {0.01, 0.1, 0.3, 0.5, 1}, and batch size from {32, 64, 128, 256, 512}. For datasets in Drug OOD, we also select dropout rate from {0.1, 0.3, 0.5}. The maximum number of epochs is set to 200 and the learning rate is set to 0.001. Please refer to Table 6 for a detailed hyper-parameter configuration of various datasets. |