Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dink-Net: Neural Clustering on Large Graphs
Authors: Yue Liu, Ke Liang, Jun Xia, Sihang Zhou, Xihong Yang, Xinwang Liu, Stan Z. Li
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both experimental results and theoretical analyses demonstrate the superiority of our method. Compared to the runner-up, Dink-Net achieves 9.62% NMI improvement on the ogbn-papers100M dataset with 111 million nodes and 1.6 billion edges. The experimental results are obtained from the server with four core Intel(R) Xeon(R) Platinum 8358 CPUs @ 2.60GHZ, one NVIDIA A100 GPU (40G), and the Py Torch platform. To evaluate the node clustering performance, we use seven attribute graph datasets, including Cora, Cite Seer, Amazon-Photo, ogbn-arxiv, Reddit, ogbn-products, ogbn-papers100M (Hu et al., 2020). |
| Researcher Affiliation | Academia | Corresponding Author 1National University of Defense Technology 2Westlake University. Email: Yue Liu <EMAIL>, Xinwang Liu <EMAIL>, Stan Z. Li <EMAIL>. |
| Pseudocode | Yes | The overall workflow of our proposed Dink-Net is demonstrated in Algorithm 1 and the Py Torch-style pseudo-code is given in Appendix.E. |
| Open Source Code | Yes | The source code is released: Dink-Net I. Ihttps://github.com/yueliu1999/Dink-Net |
| Open Datasets | Yes | To evaluate the node clustering performance, we use seven attribute graph datasets, including Cora, Cite Seer, Amazon-Photo, ogbn-arxiv, Reddit, ogbn-products, ogbn-papers100M (Hu et al., 2020). Appendix G. URLs of Used Datasets: Cora: https://docs.dgl.ai/#Cora Graph Dataset, Cite Seer: https://docs.dgl.ai/#dgl.data.Citeseer Graph Dataset, Amazon-Photo: https://docs.dgl.ai/#dgl.data.Amazon Co Buy Photo Dataset, ogbn-arxiv: https://ogb.stanford.edu/docs/nodeprop/#ogbn-arxiv, Reddit: https://docs.dgl.ai/#dgl.data.Reddit Dataset, ogbn-products: https://ogb.stanford.edu/docs/nodeprop/#ogbn-products, ogbn-papers100M: https://ogb.stanford.edu/docs/nodeprop/#ogbn-papers100M |
| Dataset Splits | No | The paper lists datasets and discusses evaluation metrics but does not explicitly provide details about train/validation/test splits (e.g., percentages or specific sample counts) in the main text or appendices. |
| Hardware Specification | Yes | Experimental results are obtained from the server with four core Intel(R) Xeon(R) Platinum 8358 CPUs @ 2.60GHZ, one NVIDIA A100 GPU (40G), and the Py Torch platform. |
| Software Dependencies | No | The paper mentions 'Py Torch platform' but does not specify a version number for PyTorch or any other software libraries used. While GCN and MLP are mentioned, no specific version numbers for these implementations or associated libraries are provided. |
| Experiment Setup | Yes | Appendix C. Design Details & Hyper-parameter Settings: Table 4. Hyper-parameter settings of our proposed method. This table lists specific values for T (pre-training epochs), T' (fine-tuning epochs), β (pre-training learning rate), β' (fine-tuning learning rate), λ (trade-off parameter), B (batch size), and d (latent feature dimension number) for each dataset. |