Implicit SVD for Graph Representation Learning

Authors: Sami Abu-El-Haija, Hesham Mostafa, Marcel Nassar, Valentino Crespi, Greg Ver Steeg, Aram Galstyan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Recent improvements in the performance of state-of-the-art (SOTA) methods for Graph Representational Learning (GRL) have come at the cost of significant computational resource requirements for training, e.g., for calculating gradients via backprop over many data epochs. Meanwhile, Singular Value Decomposition (SVD) can find closed-form solutions to convex problems, using merely a handful of epochs. In this paper, we make GRL more computationally tractable for those with modest hardware. We design a framework that computes SVD of implicitly defined matrices, and apply this framework to several GRL tasks. For each task, we derive linear approximation of a SOTA model, where we design (expensiveto-store) matrix M and train the model, in closed-form, via SVD of M, without calculating entries of M. By converging to a unique point in one step, and without calculating gradients, our models show competitive empirical test performance over various graphs such as article citation and biological interaction networks. More importantly, SVD can initialize a deeper model, that is architected to be nonlinear almost everywhere, though behaves linearly when its parameters reside on a hyperplane, onto which SVD initializes. The deeper model can then be fine-tuned within only a few epochs. Overall, our procedure trains hundreds of times faster than state-of-the-art methods, while competing on empirical test performance. We open-source our implementation at: https://github.com/samihaija/isvd
Researcher Affiliation Collaboration Sami Abu-El-Haija USC Information Sciences Institute sami@haija.org Hesham Mostafa, Marcel Nassar Intel Labs {hesham.mostafa,marcel.nassar}@intel.com Valentino Crespi, Greg Ver Steeg, Aram Galstyan USC Information Sciences Institute {vcrespi,gregv,galstyan}@isi.edu
Pseudocode No The paper includes a code snippet in Figure 1 for symbolic matrix representation, but it is not presented as formal pseudocode for an algorithm. It mentions their SVD implementation follows a prototype in the Appendix, but the Appendix content is not provided.
Open Source Code Yes We open-source our implementation at: https://github.com/samihaija/isvd
Open Datasets Yes We download and experiment on 9 datasets summarized in Table 1. ... For these datasets, we use the train-test splits of Abu-El-Haija et al. [2018]. ... We use the official train-test-validation splits and evaluator of OGB.
Dataset Splits Yes We download and experiment on 9 datasets summarized in Table 1. ... For these datasets, we use the train-test splits of Abu-El-Haija et al. [2018]. ... We use the official train-test-validation splits and evaluator of OGB. ... Then, we PCA the resulting matrix to 1000 dimensions, which forms our new X. The second SVD run is to train the classification model parameters c W . From the PCA-ed X, we construct the implicit matrix c M(NC) with L = 15 layers and obtain c W = VS+U Y[train] with U, S, V SVD100(c M(NC) [train]), per in RHS of Eq. 8.
Hardware Specification Yes For time comparisons, we train all models on Tesla K80.
Software Dependencies No The paper mentions software frameworks like Tensor Flow and Py Torch, and that their implementation is a python software library, but it does not provide specific version numbers for any of these software components.
Experiment Setup Yes For our models, the row labeled i SVD100(c M(NC)), we run our implicit SVD twice per graph. The first run incorporates structural information: we (implicitly) construct c M(NE) with λ = 0.05 and C = 3, then obtain L, R SVD64(c M(NE)), per Eq. 5. Then, we concatenate L and R into X. Then, we PCA the resulting matrix to 1000 dimensions, which forms our new X. The second SVD run is to train the classification model parameters c W . From the PCA-ed X, we construct the implicit matrix c M(NC) with L = 15 layers and obtain c W = VS+U Y[train] with U, S, V SVD100(c M(NC) [train]), per in RHS of Eq. 8. For our second + dropout model variant, we (implicit) augment the data by c M(NC) c M(NC) | PD(c M(NC)) , update indices [train] train | train then similarly learn as: SVD100(c M(NC) [train]) c W = VS+U Y[train] ... We propose f with θ = {µ, s}: fµ,s(i, j) = Ex N(µ,s) Ui, Vj Sx = U i Ex [Sx] Vj = U i Ω Sx N(x | µ, s) dx Vj, (9) where N is the truncated normal distribution (we truncate to Ω= [0.5, 2]).