Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Tackling Provably Hard Representative Selection via Graph Neural Networks
Authors: Mehran Kazemi, Anton Tsitsulin, Hossein Esfandiari, Mohammadhossein Bateni, Deepak Ramachandran, Bryan Perozzi, Vahab Mirrokni
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate the effectiveness of RS-GNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RS-GNN achieves significant improvements over established baselines on a suite of eight benchmarks. |
| Researcher Affiliation | Industry | Mehran Kazemi EMAIL Anton Tsitsulin EMAIL Hossein Esfandiari EMAIL Mohammad Hossein Bateni EMAIL Deepak Ramachandran EMAIL Bryan Perozzi EMAIL Vahab Mirrokni EMAIL Google Research |
| Pseudocode | Yes | Algorithm 1 The training procedure of RS-GNN. Input: G = (V, A, X), k |
| Open Source Code | Yes | The code is available at: https://github.com/google-research/google-research/tree/master/rs_gnn. |
| Open Datasets | Yes | We use eight established benchmarks in the GNN literature: three citation networks namely Cora, Cite Seer, and Pubmed Sen et al. (2008); Hu et al. (2020), a citation network named OGBN-Arxiv Hu et al. (2020) which is orders of magnitude larger than the previous three, two datasets from Amazon products (Photos and PC) Shchur et al. (2018), and two datasets from Microsoft Academic (CS and physics) Shchur et al. (2018). |
| Dataset Splits | Yes | We randomly split the remaining nodes in (V S) into validation and test sets by selecting 500 nodes for validation and the rest for testing. |
| Hardware Specification | Yes | Our experiments were done on a TPU v2 for all datasets except for the Arxiv dataset where we used a TPU v3 as the experiments with the Arxiv dataset require more memory. |
| Software Dependencies | No | The paper mentions libraries like Jax/Flax, Jraph, scikit-learn, and scikit-learn-extra with their respective publication citations, but it does not specify explicit version numbers for these software dependencies used in the experiments (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | We set the learning rate to 0.001 and optimized all model parameters (including the DGI parameters and the center parameters) jointly. For the experiments that had access to the original graph structure, we set the DGI hidden dimension to 512 for all datasets except for the Arxiv dataset where we set it to 256 to reduce memory usage. For the experiments with no access to the original graph structure, we set the DGI hidden dimension to 128 as there exists less signal in this case. We trained the DGI models for 2000 epochs both for our model and the baselines. For our model, we set λ in the main loss function to 0.001 for all datasets. Also, for the experiments where a graph structure is not provided as input, to create a k NN graph we connect each node to its closest 15 nodes for all the datasets. For the classification GCN model, we used a two-layer GCN model with PRe LU activations (He et al., 2015) and with a hidden dimension of 32. We added a dropout layer after the first layer with a drop rate of 0.5. The weight decay was set to 5e 4. |