reproducibilityindex.ai

Efficient Learning of Linear Graph Neural Networks via Node Subsampling

Authors: Seiyun Shin, Ilan Shomorony, Han Zhao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We support our theoretical findings via numerical experiments3 to validate the performance of Algorithm 1 and Algorithm 2 in a non-asymptotic setting. Dataset and evaluation methods: We consider benchmark datasets from the Open Graph Benchmark (OGB) [29], Stanford Network Analysis Project (SNAP) [30], and House dataset [31]. For evaluation methods, we compute the mean squared error (MSE), wall-clock run-time, and peak memory usage of our two proposed schemes and five baselines
Researcher Affiliation	Academia	Seiyun Shin1, Ilan Shomorony1, and Han Zhao2 1Department of ECE and 2Department of CS University of Illinois at Urbana-Champaign, IL {seiyuns2, ilans, hanzhao}@illinois.edu
Pseudocode	Yes	Algorithm 1 ESTIMATELEVERAGESCORES(A, X) via Uniform Sampling, Algorithm 2 ESTIMATELEVERAGESCORES(A, X) via Data-dependent Sampling, Algorithm 3 LEVERAGESCORE(X), Algorithm 4 LEVERAGESCORESAMPLING(A, {bℓi([AX\| y]), i [n]}), Algorithm 5 REGRESSIONSOLVER( AX, y)
Open Source Code	Yes	Our code is publicly available online at https://github.com/seiyun-shin/gnn_node_subsampling.
Open Datasets	Yes	We consider benchmark datasets from the Open Graph Benchmark (OGB) [29], Stanford Network Analysis Project (SNAP) [30], and House dataset [31]. For the MSE comparisons, we consider (1) ogbl-ddi dataset from OGB, (2) ego-Facebook dataset from SNAP, and (3) House dataset.
Dataset Splits	No	The paper mentions 'training data' but does not specify exact split percentages, sample counts, or a detailed splitting methodology for training, validation, and test sets.
Hardware Specification	Yes	For a fair comparison on Table 1, Table 2, and Table 3, we use the same regression solver and use the same specification of 48 cores of an x86 64 processor with 503.74GB memory.
Software Dependencies	No	The paper mentions general software like 'scipy.sparse package' and 'numpy package' but does not provide specific version numbers for these or any other ancillary software components used for the experiments.
Experiment Setup	Yes	Accordingly, to conduct controlled experiments, we synthetically generate data matrix X, weight vector w, and noisy labels y by setting the linear relationship between labels and features: y := Xw + n. Here n denotes an additive Gaussian noise with parameters (µ, σ) = (1, 10). Based upon the size of the two datasets described below, we consider the data matrices X Rn d (with n {4267, 4039} and d = 100), where each row (i.e., each node s feature vector) is drawn according to the Cauchy distribution with parameters (x0, γ) = (10, 100), in an i.i.d. manner. Using the Re LU activation function and one hidden layer, Figure 2d plots the mean squared error as a function of the observation budget, shown as a percentage of the number of observed nodes in the graph.