Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to Group Auxiliary Datasets for Molecule
Authors: Tinglin Huang, Ziniu Hu, Rex Ying
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate the efficiency and effectiveness of Mol Group, showing an average improvement of 4.41%/3.47% for GIN/Graphormer trained with the group of molecule datasets selected by Mol Group on 11 target molecule datasets. |
| Researcher Affiliation | Academia | Tinglin Huang1 Ziniu Hu2 Rex Ying1 1Yale University, 2University of California, Los Angeles |
| Pseudocode | Yes | The pseudo-code is presented in Algo.1. |
| Open Source Code | Yes | Source code is available at https://github.com/Graph-and-Geometric-Learning/Mol Group. |
| Open Datasets | Yes | Our study utilizes 15 molecule datasets of varying sizes obtained from Molecule Net [51, 18] and Chem BL [33], which can be categorized into three groups: medication, quantum mechanics, and chemical analysis. All the involved datasets can be accessed and downloaded from OGB4 or Molecule Net repository5. (Footnote 4: https://ogb.stanford.edu/, Footnote 5: https://moleculenet.org/) |
| Dataset Splits | Yes | We follow the original split setting, where qm8 and qm9 are randomly split, and scaffold splitting is used for the others. |
| Hardware Specification | Yes | The experiments are conducted on a single Linux server with The Intel Xeon Gold 6240 36-Core Processor, 361G RAM, and 4 NVIDIA A100-40GB. |
| Software Dependencies | Yes | Our method is implemented on Py Torch 1.10.0 and Python 3.9.13. |
| Experiment Setup | Yes | As for GIN [53], we fix the batch size as 128 and train the model for 50 epochs. We use Adam [24] with a learning rate of 0.001 for optimization. The hidden size and number of layers are set as 300 and 5 respectively. We set the dropout rate as 0.5 and apply batchnorm [21] in each layer. All the results are reported after 5 different random seeds. As for Graphormer [57], we fix the batch size as 128 and train the model for 30 epochs. Adam W [31] with a learning rate of 0.0001 is used as the optimizer. The hidden size, number of layers, and number of attention heads are set as 512, 5, and 8 respectively. We set the dropout rate and attention dropout rate as 0.1 and 0.3. |