Grouping Matrix Based Graph Pooling with Adaptive Number of Clusters

Authors: Sung Moon Ko, Sungjun Cho, Dae-Woong Jeong, Sehui Han, Moontae Lee, Honglak Lee

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on molecular property prediction tasks demonstrate that our method outperforms conventional methods. Experiments Experimental Setup We arrange a total of five datasets to test our algorithms: two are open datasets collected from Molecule Net (Ramsundar et al. 2019) and Binding DB (Chen X 2001, 2002; Chen and Gilson 2002; Liu T 2007; Gilson et al. 2015), three are manually collected and arranged from different literatures including scientific articles and patents. For empirical evaluation, we compare the performance of GMPOOL and NGMPOOL against that of five other pooling approaches.
Researcher Affiliation Collaboration 1LG AI Research 2University of Illinois Chicago {sungmoon.ko, sungjun.cho, dw.jeong, hansse.han, moontae.lee, honglak}@lgresearch.ai
Pseudocode No The paper describes algorithms and processes in narrative text and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the code for GMPOOL or NGMPOOL. It only states: 'Implementations of other pooling baselines are borrowed from the pytorch-geometric library.'
Open Datasets Yes We arrange a total of five datasets to test our algorithms: two are open datasets collected from Molecule Net (Ramsundar et al. 2019) and Binding DB (Chen X 2001, 2002; Chen and Gilson 2002; Liu T 2007; Gilson et al. 2015), three are manually collected and arranged from different literatures including scientific articles and patents.
Dataset Splits Yes Every experiments are tested under five-fold settings with uniform sampling and 10% of dedicated test set to secure the results, and single RTX 3090 is used for the experiments.
Hardware Specification Yes Every experiments are tested under five-fold settings with uniform sampling and 10% of dedicated test set to secure the results, and single RTX 3090 is used for the experiments.
Software Dependencies No The paper mentions 'RDKit' and 'pytorch-geometric library' but does not specify version numbers for these or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes For the DMPNN backbone of the model, we use the same hidden size of 200 across all three independent layers: the initial edge features with dimension de and node features with dimension dn are passed through layers of dimension de 200 and dn 200, respectively with Re LU activation. The message passing module passes node embeddings through a linear layer with dimension 200 200, followed by Re LU activation and 0.15 dropout layer. For graph representation we use a global average pooling scheme. GMPOOL and NGMPOOL construct the grouping matrix via a 200 1 linear layer and sigmoid activation without any parameters related to cluster numbers or thresholds. We use a batch size of 80 and Adam optimizer for all model training. For baseline pooling methods that require the cluster size as a hyperparameter, we perform grid search across candi-dates following previous work, and present best results.