Fragment-based Pretraining and Finetuning on Molecular Graphs
Authors: Kha-Dinh Luong, Ambuj K Singh
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our graph fragment-based pretraining (Graph FP) advances the performances on 5 out of 8 common molecular benchmarks and improves the performances on long-range biological benchmarks by at least 11.5%. Code is available at: https://github.com/lvkd84/Graph FP. (Abstract) and '4 Experiments' |
| Researcher Affiliation | Academia | Kha-Dinh Luong, Ambuj Singh Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 {vluong,ambuj}@cs.ucsb.edu |
| Pseudocode | No | The paper describes the methodology using text and mathematical equations but does not include structured pseudocode or algorithm blocks with explicit labels such as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | Code is available at: https://github.com/lvkd84/Graph FP. |
| Open Datasets | Yes | We use a processed subset containing 456K molecules from the Ch EMBL database [24] for pretraining. |
| Dataset Splits | Yes | For downstream evaluation, we consider 8 binary graph classification tasks from Molecule Net [36] with scaffold split [15]. Moreover, to assess the ability of the models in recognizing global arrangement, we consider two graph prediction tasks on large peptide molecules from the Long-range Graph Benchmark [7]. Long-range graph benchmarks are split using stratified random split. |
| Hardware Specification | Yes | All experiments are run on individual Tesla V 100 GPUs. |
| Software Dependencies | No | The paper mentions software components like GIN and Adam W optimizer but does not specify their version numbers or the versions of other key software dependencies (e.g., Python, PyTorch/TensorFlow libraries). |
| Experiment Setup | Yes | All pretrainings are done in 100 epochs, with Adam W optimizer, batch size 256, and initial learning rate 1 × 10−3. We reduce the learning rate by a factor of 0.1 every 5 epochs without improvement. On graph classification benchmarks, to ensure comparability, our finetuning setting is mostly similar to that of previous works [15, 23]: 100 epochs, Adam optimizer, batch size 256, initial learning rate 1 × 10−3, and dropout rate chosen from {0.0, 0.5}. |