Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings

Authors: Christopher Morris, Gaurav Rattan, Petra Mutzel

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our intention here is to investigate the benefits of the local, sparse algorithms, both kernel and neural architectures, compared to the global, dense algorithms, and standard kernel and GNN baselines. More precisely, we address the following questions: Q1 Do the local algorithms, both kernel and neural architectures, lead to improved classification and regression scores on real-world benchmark datasets compared to global, dense algorithms and standard baselines? Q2 Does the δ-k-LWL+ lead to improved classification accuracies compared to the δ-k-LWL? Does it lead to higher computation times? Q3 Do the local algorithms prevent overfitting to the training set? Q4 How much do the local algorithms speed up the computation time compared to the non-local algorithms or dense neural architectures? The source code of all methods and evaluation procedures is available at https://www.github. com/chrsmrrs/sparsewl. Datasets To evaluate kernels, we use the following, well-known, small-scale datasets: ENZYMES [98, 13], IMDB-BINARY, IMDB-MULTI [119], NCI1, NCI109 [109], PTC_FM [53], PROTEINS [31, 13], and REDDIT-BINARY [119]. ... Results and discussion In the following we answer questions Q1 to Q4.
Researcher Affiliation Academia CERC in Data Science for Real-Time Decision-Making, Polytechnique Montréal Department of Computer Science, RWTH Aachen University Department of Computer Science, University of Bonn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The source code of all methods and evaluation procedures is available at https://www.github. com/chrsmrrs/sparsewl.
Open Datasets Yes Datasets To evaluate kernels, we use the following, well-known, small-scale datasets: ENZYMES [98, 13], IMDB-BINARY, IMDB-MULTI [119], NCI1, NCI109 [109], PTC_FM [53], PROTEINS [31, 13], and REDDIT-BINARY [119]. ... For the neural architectures, we used the large-scale molecular regression datasets ZINC [34, 57] and ALCHEMY [21]. ... QM9 [91, 112] regression dataset.6 All datasets can be obtained from http://www.graphlearning.io [84].
Dataset Splits No The paper mentions 'training versus test accuracy' and refers to 'evaluation guidelines outlined in [84]' and hyperparameter selection routines in Appendix E.2, but it does not provide explicit details about the specific training/validation/test dataset splits (e.g., percentages, sample counts, or explicit standard splits) in the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'PYTORCH GEOMETRIC [36]' and a 'Python-wrapped C++11 preprocessing routine' but does not specify version numbers for these software components.
Experiment Setup No The paper mentions 'hyperparameter selection routines' in Appendix E.2 but does not provide specific experimental setup details such as hyperparameter values, training configurations, or system-level settings within the main text.