reproducibilityindex.ai

Implicit SVD for Graph Representation Learning

Authors: Sami Abu-El-Haija, Hesham Mostafa, Marcel Nassar, Valentino Crespi, Greg Ver Steeg, Aram Galstyan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Recent improvements in the performance of state-of-the-art (SOTA) methods for Graph Representational Learning (GRL) have come at the cost of signiﬁcant computational resource requirements for training, e.g., for calculating gradients via backprop over many data epochs. Meanwhile, Singular Value Decomposition (SVD) can ﬁnd closed-form solutions to convex problems, using merely a handful of epochs. In this paper, we make GRL more computationally tractable for those with modest hardware. We design a framework that computes SVD of implicitly deﬁned matrices, and apply this framework to several GRL tasks. For each task, we derive linear approximation of a SOTA model, where we design (expensiveto-store) matrix M and train the model, in closed-form, via SVD of M, without calculating entries of M. By converging to a unique point in one step, and without calculating gradients, our models show competitive empirical test performance over various graphs such as article citation and biological interaction networks. More importantly, SVD can initialize a deeper model, that is architected to be nonlinear almost everywhere, though behaves linearly when its parameters reside on a hyperplane, onto which SVD initializes. The deeper model can then be ﬁne-tuned within only a few epochs. Overall, our procedure trains hundreds of times faster than state-of-the-art methods, while competing on empirical test performance. We open-source our implementation at: https://github.com/samihaija/isvd
Researcher Affiliation	Collaboration	Sami Abu-El-Haija USC Information Sciences Institute sami@haija.org Hesham Mostafa, Marcel Nassar Intel Labs {hesham.mostafa,marcel.nassar}@intel.com Valentino Crespi, Greg Ver Steeg, Aram Galstyan USC Information Sciences Institute {vcrespi,gregv,galstyan}@isi.edu
Pseudocode	No	The paper includes a code snippet in Figure 1 for symbolic matrix representation, but it is not presented as formal pseudocode for an algorithm. It mentions their SVD implementation follows a prototype in the Appendix, but the Appendix content is not provided.
Open Source Code	Yes	We open-source our implementation at: https://github.com/samihaija/isvd
Open Datasets	Yes	We download and experiment on 9 datasets summarized in Table 1. ... For these datasets, we use the train-test splits of Abu-El-Haija et al. [2018]. ... We use the ofﬁcial train-test-validation splits and evaluator of OGB.
Dataset Splits	Yes	We download and experiment on 9 datasets summarized in Table 1. ... For these datasets, we use the train-test splits of Abu-El-Haija et al. [2018]. ... We use the ofﬁcial train-test-validation splits and evaluator of OGB. ... Then, we PCA the resulting matrix to 1000 dimensions, which forms our new X. The second SVD run is to train the classiﬁcation model parameters c W . From the PCA-ed X, we construct the implicit matrix c M(NC) with L = 15 layers and obtain c W = VS+U Y[train] with U, S, V SVD100(c M(NC) [train]), per in RHS of Eq. 8.
Hardware Specification	Yes	For time comparisons, we train all models on Tesla K80.
Software Dependencies	No	The paper mentions software frameworks like Tensor Flow and Py Torch, and that their implementation is a python software library, but it does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	For our models, the row labeled i SVD100(c M(NC)), we run our implicit SVD twice per graph. The ﬁrst run incorporates structural information: we (implicitly) construct c M(NE) with λ = 0.05 and C = 3, then obtain L, R SVD64(c M(NE)), per Eq. 5. Then, we concatenate L and R into X. Then, we PCA the resulting matrix to 1000 dimensions, which forms our new X. The second SVD run is to train the classiﬁcation model parameters c W . From the PCA-ed X, we construct the implicit matrix c M(NC) with L = 15 layers and obtain c W = VS+U Y[train] with U, S, V SVD100(c M(NC) [train]), per in RHS of Eq. 8. For our second + dropout model variant, we (implicit) augment the data by c M(NC) c M(NC) \| PD(c M(NC)) , update indices [train] train \| train then similarly learn as: SVD100(c M(NC) [train]) c W = VS+U Y[train] ... We propose f with θ = {µ, s}: fµ,s(i, j) = Ex N(µ,s) Ui, Vj Sx = U i Ex [Sx] Vj = U i Ω Sx N(x \| µ, s) dx Vj, (9) where N is the truncated normal distribution (we truncate to Ω= [0.5, 2]).