Dink-Net: Neural Clustering on Large Graphs
Authors: Yue Liu, Ke Liang, Jun Xia, Sihang Zhou, Xihong Yang, Xinwang Liu, Stan Z. Li
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both experimental results and theoretical analyses demonstrate the superiority of our method. Compared to the runner-up, Dink-Net achieves 9.62% NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges. The experimental results are obtained from the server with four core Intel(R) Xeon(R) Platinum 8358 CPUs @ 2.60GHZ, one NVIDIA A100 GPU (40G), and the Py Torch platform. To evaluate the node clustering performance, we use seven attribute graph datasets, including Cora, Cite Seer, Amazon-Photo, ogbn-arxiv, Reddit, ogbn-products, ogbn-papers100M (Hu et al., 2020). |
| Researcher Affiliation | Academia | Corresponding Author 1National University of Defense Technology 2Westlake University. Email: Yue Liu <yueliu19990731@163.com>, Xinwang Liu <xinwangliu@nudt.edu.cn>, Stan Z. Li <Stan.ZQ.Li@westlake.edu.cn>. |
| Pseudocode | Yes | The overall workflow of our proposed Dink-Net is demonstrated in Algorithm 1 and the Py Torch-style pseudo-code is given in Appendix.E. |
| Open Source Code | Yes | The source code is released: Dink-Net I. Ihttps://github.com/yueliu1999/Dink-Net |
| Open Datasets | Yes | To evaluate the node clustering performance, we use seven attribute graph datasets, including Cora, Cite Seer, Amazon-Photo, ogbn-arxiv, Reddit, ogbn-products, ogbn-papers100M (Hu et al., 2020). Appendix G. URLs of Used Datasets: Cora: https://docs.dgl.ai/#Cora Graph Dataset, Cite Seer: https://docs.dgl.ai/#dgl.data.Citeseer Graph Dataset, Amazon-Photo: https://docs.dgl.ai/#dgl.data.Amazon Co Buy Photo Dataset, ogbn-arxiv: https://ogb.stanford.edu/docs/nodeprop/#ogbn-arxiv, Reddit: https://docs.dgl.ai/#dgl.data.Reddit Dataset, ogbn-products: https://ogb.stanford.edu/docs/nodeprop/#ogbn-products, ogbn-papers100M: https://ogb.stanford.edu/docs/nodeprop/#ogbn-papers100M |
| Dataset Splits | No | The paper lists datasets and discusses evaluation metrics but does not explicitly provide details about train/validation/test splits (e.g., percentages or specific sample counts) in the main text or appendices. |
| Hardware Specification | Yes | Experimental results are obtained from the server with four core Intel(R) Xeon(R) Platinum 8358 CPUs @ 2.60GHZ, one NVIDIA A100 GPU (40G), and the Py Torch platform. |
| Software Dependencies | No | The paper mentions 'Py Torch platform' but does not specify a version number for PyTorch or any other software libraries used. While GCN and MLP are mentioned, no specific version numbers for these implementations or associated libraries are provided. |
| Experiment Setup | Yes | Appendix C. Design Details & Hyper-parameter Settings: Table 4. Hyper-parameter settings of our proposed method. This table lists specific values for T (pre-training epochs), T' (fine-tuning epochs), β (pre-training learning rate), β' (fine-tuning learning rate), λ (trade-off parameter), B (batch size), and d (latent feature dimension number) for each dataset. |