Large-Scale Representation Learning on Graphs via Bootstrapping

Authors: Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L Dyer, Remi Munos, Petar Veličković, Michal Valko

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present an extensive empirical study of performance and scalability, showing that BGRL is effective across a wide range of settings from frozen linear evaluation to semi-supervised learning, and both when performing full-graph training and training on subsampled node neighborhoods.
Researcher Affiliation Collaboration Shantanu Thakoor Deep Mind Corentin Tallec Deep Mind Mohammad Gheshlaghi Azar Deep Mind Mehdi Azabou Georgia Institute of Technology Eva Dyer Georgia Institute of Technology Rémi Munos Deep Mind Petar Veliˇckovi c Deep Mind Michal Valko Deep Mind
Pseudocode No The paper describes the algorithm components and update steps in text and with a diagram (Figure 1), but no formal pseudocode block or algorithm listing is provided.
Open Source Code Yes Algorithm implementation and experiment code for most tasks can be found at https://github.com/nerdslab/bgrl while code for our solution on MAG240M has been open-sourced as part of the KDD Cup 2021 (Addanki et al., 2021) at https://github.com/deepmind/deepmindresearch/tree/master/ogb_lsc/mag.
Open Datasets Yes We analyze the performance of BGRL on a set of 7 standard transductive and inductive benchmark tasks, as well as in the very high-data regime by evaluating on the MAG240M dataset (Hu et al., 2021). ... Dataset sizes are summarized in Table 2 and described further in Appendix E.
Dataset Splits Yes Wiki CS3 ... This dataset comes with 20 canonical train/valid/test splits, which we use directly.
Hardware Specification Yes OOM indicates ruuning out of memory on a 16GB V100 GPU. (Table 1 footnote) Most experiments finish within 30 minutes on a single V100 GPU, and thus are easy to verify with few resources.
Software Dependencies No In all our experiments, we use the Adam W optimizer (Kingma & Ba, 2015; Gugger & Howard, 2018) with weight decay set to 10 5, and all models initialized using Glorot initialization (Glorot & Bengio, 2010). For the smaller datasets of Wiki CS, Amazon Computers/Photos, and Coauthor CS/Physics, we use an ℓ2-regularized Logistic Regression classifier from Scikit-Learn (Pedregosa et al., 2011) using the liblinear solver. While software components are mentioned, specific version numbers for these are not provided.
Experiment Setup Yes In all our experiments, we use the Adam W optimizer (Kingma & Ba, 2015; Gugger & Howard, 2018) with weight decay set to 10 5, and all models initialized using Glorot initialization (Glorot & Bengio, 2010). The BGRL predictor pθ used to predict the embedding of nodes across views is fixed to be a Multilayer Perceptron (MLP) with a single hidden layer. The decay rate τ controlling the rate of updates of the BGRL target parameters φ is initialized to 0.99 and gradually increased to 1.0 over the course of training following a cosine schedule. Other model architecture and training details vary per dataset and are described further below. The augmentation hyperparameters pf1,2 and pe1,2 are reported below. Table 8 describes hyperparameter and architectural details for most of our experimental setups with BGRL.