HARP: Hierarchical Representation Learning for Networks
Authors: Haochen Chen, Bryan Perozzi, Yifan Hu, Steven Skiena
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that applying HARP s hierarchical paradigm yields improved implementations for all three of these methods, as evaluated on classification tasks on real-world graphs such as DBLP, Blog Catalog, and Cite Seer, where we achieve a performance gain over the original implementations by up to 14% Macro F1. |
| Researcher Affiliation | Collaboration | Haochen Chen Stony Brook University haocchen@cs.stonybrook.edu Bryan Perozzi Google Research bperozzi@acm.org Yifan Hu Yahoo! Research yifanhu@oath.com Steven Skiena Stony Brook University skiena@cs.stonybrook.edu |
| Pseudocode | Yes | Algorithm 1 HARP(G, Embed()) Input: graph G(V, E) arbitrary graph embedding algorithm EMBED() Output: matrix of vertex representations Φ R|V | d 1: G0, G1, , GL GRAPHCOARSENING(G) 2: Initialize Φ GL by assigning zeros 3: ΦGL EMBED(GL, Φ GL) 4: for i = L 1 to 0 do 5: Φ Gi PROLONGATE(ΦGi+1, Gi+1, Gi) 6: ΦGi EMBED(Gi, Φ Gi) 7: end for 8: return ΦG0 |
| Open Source Code | No | The paper does not include an explicit statement or link providing concrete access to the source code for the HARP methodology described in the paper. |
| Open Datasets | Yes | Name DBLP Blogcatalog Cite Seer # Vertices 29,199 10,312 3,312 # Edges 133,664 333,983 4,732 # Classes 4 39 6 Task Classification Classification Classification Table 1: Statistics of the graphs used in our experiments. DBLP (Perozzi et al. 2017) DBLP is a co-author graph of researchers in computer science. Blog Catalog (Tang and Liu 2009) Blog Catalog is a network of social relationships between users on the Blog Catalog website. Cite Seer (Sen et al. 2008) Cite Seer is a citation network between publications in computer science. |
| Dataset Splits | No | The paper mentions using 'a portion (TR) of nodes along with their labels are randomly sampled from the graph as training data, and the task is to predict the labels for the remaining nodes.' It specifies percentages like '5%, 50%, and 5% labeled nodes respectively' for training, but does not explicitly mention a separate validation set split. |
| Hardware Specification | Yes | All models run on a single machine with 128GB memory, 24 CPU cores at 2.0GHZ with 20 threads. |
| Software Dependencies | No | The paper mentions using 'Lib Linear' for logistic regression but does not provide a specific version number for it or for any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | In HARP(DW), the parameter setting is γ = 40, t = 10, w = 10, d = 128. For HARP(LINE), we run 50 iterations on all graph edges on all coarsening levels. The representation size d is set to 64 for both LINE and HARP(LINE). Both in-out and return hyperparameters are set to 1.0. For all models, the initial learning rate and final learning rate are set to 0.025 and 0.001 respectively. |