Efficient Learning of Linear Graph Neural Networks via Node Subsampling
Authors: Seiyun Shin, Ilan Shomorony, Han Zhao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support our theoretical findings via numerical experiments3 to validate the performance of Algorithm 1 and Algorithm 2 in a non-asymptotic setting. Dataset and evaluation methods: We consider benchmark datasets from the Open Graph Benchmark (OGB) [29], Stanford Network Analysis Project (SNAP) [30], and House dataset [31]. For evaluation methods, we compute the mean squared error (MSE), wall-clock run-time, and peak memory usage of our two proposed schemes and five baselines |
| Researcher Affiliation | Academia | Seiyun Shin1, Ilan Shomorony1, and Han Zhao2 1Department of ECE and 2Department of CS University of Illinois at Urbana-Champaign, IL {seiyuns2, ilans, hanzhao}@illinois.edu |
| Pseudocode | Yes | Algorithm 1 ESTIMATELEVERAGESCORES(A, X) via Uniform Sampling, Algorithm 2 ESTIMATELEVERAGESCORES(A, X) via Data-dependent Sampling, Algorithm 3 LEVERAGESCORE(X), Algorithm 4 LEVERAGESCORESAMPLING(A, {bℓi([AX| y]), i [n]}), Algorithm 5 REGRESSIONSOLVER( AX, y) |
| Open Source Code | Yes | Our code is publicly available online at https://github.com/seiyun-shin/gnn_node_subsampling. |
| Open Datasets | Yes | We consider benchmark datasets from the Open Graph Benchmark (OGB) [29], Stanford Network Analysis Project (SNAP) [30], and House dataset [31]. For the MSE comparisons, we consider (1) ogbl-ddi dataset from OGB, (2) ego-Facebook dataset from SNAP, and (3) House dataset. |
| Dataset Splits | No | The paper mentions 'training data' but does not specify exact split percentages, sample counts, or a detailed splitting methodology for training, validation, and test sets. |
| Hardware Specification | Yes | For a fair comparison on Table 1, Table 2, and Table 3, we use the same regression solver and use the same specification of 48 cores of an x86 64 processor with 503.74GB memory. |
| Software Dependencies | No | The paper mentions general software like 'scipy.sparse package' and 'numpy package' but does not provide specific version numbers for these or any other ancillary software components used for the experiments. |
| Experiment Setup | Yes | Accordingly, to conduct controlled experiments, we synthetically generate data matrix X, weight vector w, and noisy labels y by setting the linear relationship between labels and features: y := Xw + n. Here n denotes an additive Gaussian noise with parameters (µ, σ) = (1, 10). Based upon the size of the two datasets described below, we consider the data matrices X Rn d (with n {4267, 4039} and d = 100), where each row (i.e., each node s feature vector) is drawn according to the Cauchy distribution with parameters (x0, γ) = (10, 100), in an i.i.d. manner. Using the Re LU activation function and one hidden layer, Figure 2d plots the mean squared error as a function of the observation budget, shown as a percentage of the number of observed nodes in the graph. |