reproducibilityindex.ai

HARP: Hierarchical Representation Learning for Networks

Authors: Haochen Chen, Bryan Perozzi, Yifan Hu, Steven Skiena

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that applying HARP s hierarchical paradigm yields improved implementations for all three of these methods, as evaluated on classiﬁcation tasks on real-world graphs such as DBLP, Blog Catalog, and Cite Seer, where we achieve a performance gain over the original implementations by up to 14% Macro F1.
Researcher Affiliation	Collaboration	Haochen Chen Stony Brook University haocchen@cs.stonybrook.edu Bryan Perozzi Google Research bperozzi@acm.org Yifan Hu Yahoo! Research yifanhu@oath.com Steven Skiena Stony Brook University skiena@cs.stonybrook.edu
Pseudocode	Yes	Algorithm 1 HARP(G, Embed()) Input: graph G(V, E) arbitrary graph embedding algorithm EMBED() Output: matrix of vertex representations Φ R\|V \| d 1: G0, G1, , GL GRAPHCOARSENING(G) 2: Initialize Φ GL by assigning zeros 3: ΦGL EMBED(GL, Φ GL) 4: for i = L 1 to 0 do 5: Φ Gi PROLONGATE(ΦGi+1, Gi+1, Gi) 6: ΦGi EMBED(Gi, Φ Gi) 7: end for 8: return ΦG0
Open Source Code	No	The paper does not include an explicit statement or link providing concrete access to the source code for the HARP methodology described in the paper.
Open Datasets	Yes	Name DBLP Blogcatalog Cite Seer # Vertices 29,199 10,312 3,312 # Edges 133,664 333,983 4,732 # Classes 4 39 6 Task Classiﬁcation Classiﬁcation Classiﬁcation Table 1: Statistics of the graphs used in our experiments. DBLP (Perozzi et al. 2017) DBLP is a co-author graph of researchers in computer science. Blog Catalog (Tang and Liu 2009) Blog Catalog is a network of social relationships between users on the Blog Catalog website. Cite Seer (Sen et al. 2008) Cite Seer is a citation network between publications in computer science.
Dataset Splits	No	The paper mentions using 'a portion (TR) of nodes along with their labels are randomly sampled from the graph as training data, and the task is to predict the labels for the remaining nodes.' It specifies percentages like '5%, 50%, and 5% labeled nodes respectively' for training, but does not explicitly mention a separate validation set split.
Hardware Specification	Yes	All models run on a single machine with 128GB memory, 24 CPU cores at 2.0GHZ with 20 threads.
Software Dependencies	No	The paper mentions using 'Lib Linear' for logistic regression but does not provide a specific version number for it or for any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	In HARP(DW), the parameter setting is γ = 40, t = 10, w = 10, d = 128. For HARP(LINE), we run 50 iterations on all graph edges on all coarsening levels. The representation size d is set to 64 for both LINE and HARP(LINE). Both in-out and return hyperparameters are set to 1.0. For all models, the initial learning rate and ﬁnal learning rate are set to 0.025 and 0.001 respectively.