Efficient Learning of Linear Graph Neural Networks via Node Subsampling

Authors: Seiyun Shin, Ilan Shomorony, Han Zhao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our theoretical findings via numerical experiments3 to validate the performance of Algorithm 1 and Algorithm 2 in a non-asymptotic setting. Dataset and evaluation methods: We consider benchmark datasets from the Open Graph Benchmark (OGB) [29], Stanford Network Analysis Project (SNAP) [30], and House dataset [31]. For evaluation methods, we compute the mean squared error (MSE), wall-clock run-time, and peak memory usage of our two proposed schemes and five baselines
Researcher Affiliation Academia Seiyun Shin1, Ilan Shomorony1, and Han Zhao2 1Department of ECE and 2Department of CS University of Illinois at Urbana-Champaign, IL {seiyuns2, ilans, hanzhao}@illinois.edu
Pseudocode Yes Algorithm 1 ESTIMATELEVERAGESCORES(A, X) via Uniform Sampling, Algorithm 2 ESTIMATELEVERAGESCORES(A, X) via Data-dependent Sampling, Algorithm 3 LEVERAGESCORE(X), Algorithm 4 LEVERAGESCORESAMPLING(A, {bℓi([AX| y]), i [n]}), Algorithm 5 REGRESSIONSOLVER( AX, y)
Open Source Code Yes Our code is publicly available online at https://github.com/seiyun-shin/gnn_node_subsampling.
Open Datasets Yes We consider benchmark datasets from the Open Graph Benchmark (OGB) [29], Stanford Network Analysis Project (SNAP) [30], and House dataset [31]. For the MSE comparisons, we consider (1) ogbl-ddi dataset from OGB, (2) ego-Facebook dataset from SNAP, and (3) House dataset.
Dataset Splits No The paper mentions 'training data' but does not specify exact split percentages, sample counts, or a detailed splitting methodology for training, validation, and test sets.
Hardware Specification Yes For a fair comparison on Table 1, Table 2, and Table 3, we use the same regression solver and use the same specification of 48 cores of an x86 64 processor with 503.74GB memory.
Software Dependencies No The paper mentions general software like 'scipy.sparse package' and 'numpy package' but does not provide specific version numbers for these or any other ancillary software components used for the experiments.
Experiment Setup Yes Accordingly, to conduct controlled experiments, we synthetically generate data matrix X, weight vector w, and noisy labels y by setting the linear relationship between labels and features: y := Xw + n. Here n denotes an additive Gaussian noise with parameters (µ, σ) = (1, 10). Based upon the size of the two datasets described below, we consider the data matrices X Rn d (with n {4267, 4039} and d = 100), where each row (i.e., each node s feature vector) is drawn according to the Cauchy distribution with parameters (x0, γ) = (10, 100), in an i.i.d. manner. Using the Re LU activation function and one hidden layer, Figure 2d plots the mean squared error as a function of the observation budget, shown as a percentage of the number of observed nodes in the graph.