Contrastive Multi-View Representation Learning on Graphs
Authors: Kaveh Hassani, Amir Hosein Khasahmadi
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve new state-of-the-art results in self-supervised learning on 8 out of 8 node and graph classification benchmarks under the linear evaluation protocol. For example, on Cora (node) and Reddit-Binary (graph) classification benchmarks, we achieve 86.8% and 84.5% accuracy, which are 5.5% and 2.4% relative improvements over previous state-of-the-art. |
| Researcher Affiliation | Collaboration | 1Autodesk AI Lab, Toronto, Canada 2Vector Institute, Toronto, Canada. Correspondence to: Kaveh Hassani <kaveh.hassani@autodesk.com>, Amir Hosein Khasahmadi <amir.khasahmadi@autodesk.com>. |
| Pseudocode | Yes | Algorithm 1 Contrastive multi-view graph representation learning algorithm. Input: Augmentations τα and τβ, sampler Γ, pooling P, discriminator D, loss L, encoders gθ, gω, fψ, fφ, and training graphs {G|g = (X, A) G} for sampled batch {gk}N k=1 G do // compute encodings: for k = 1 to N do Xk, Ak = Γ(gk) // sub-sample graph Vα k = τα (Ak) // first view Zα k = gθ (Xk, Vα k ) // node rep. Hα k = fψ (Zα k) // projected node rep. hα k = fφ (P (Zα k)) // projected graph rep. Vβ k = τβ (Ak) // second view Zβ k = gω Xk, Vβ k // node rep. Hβ k = fψ Zβ k // projected node rep. hβ k = fφ P Zβ k // projected graph rep. end // compute pairwise similarity: for i = 1 to N and j = 1 to N do sα ij = D hα i , Hβ j , sβ ij = D hβ i , Hα j end // compute gradients: h L sα ij + L sβ ij i return h Hα g + Hβ g, hα g + hβ g i , g G |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the described methodology's code. |
| Open Datasets | Yes | For node classification, we use Citeseer, Cora, and Pubmed citation networks (Sen et al., 2008) where documents (nodes) are connected through citations (edges). For graph classification, we use the following: MUTAG (Kriege & Mutzel, 2012) containing mutagenic compounds, PTC (Kriege & Mutzel, 2012) containing compounds tested for carcinogenicity, Reddit-Binary (Yanardag & Vishwana, 2015) connecting users (nodes) through responses (edges) in Reddit online discussions, and IMDB-Binary and IMDB-Multi (Yanardag & Vishwana, 2015) connecting actors/actresses (nodes) based on movie appearances (edges). |
| Dataset Splits | Yes | For node classification, we follow DGI and report the mean classification accuracy with standard deviation on the test nodes after 50 runs of training followed by a linear model. For graph classification, we follow Info Graph and report the mean 10-fold cross validation accuracy with standard deviation after 5 runs followed by a linear SVM. The linear classifier is trained using cross validation on training folds of data and the best mean classification accuracy is reported. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments, only general statements about the computational environment. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | We initialize the parameters using Xavier initialization (Glorot & Bengio, 2010) and train the model using Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 0.001. To have fair comparisons, we follow Info Graph for graph classification and choose the number of GCN layers, number of epochs, batch size, and the C parameter of the SVM from [2, 4, 8, 12], [10, 20, 40, 100], [32, 64, 128, 256], and [10 3, 10 2, ..., 102, 103], respectively. For node classification, we follow DGI and set the number of GCN layers and the number of epochs and to 1 and 2,000, respectively, and choose the batch size from [2, 4, 8]. We also use early stopping with a patience of 20. Finally, we set the size of hidden dimension of both node and graph representations to 512. |