PRUNE: Preserving Proximity and Global Ranking for Network Embedding

Authors: Yi-An Lai, Chin-Chi Hsu, Wen Hao Chen, Mi-Yen Yeh, Shou-De Lin

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results not only verify the above design properties but also demonstrate the superior performance in learning-to-rank, classification, regression, and link prediction tasks.
Researcher Affiliation Academia Yi-An Lai National Taiwan University b99202031@ntu.edu.tw Chin-Chi Hsu Academia Sinica chinchi@iis.sinica.edu.tw Wen-Hao Chen National Taiwan University b02902023@ntu.edu.tw Mi-Yen Yeh Academia Sinica miyen@iis.sinica.edu.tw Shou-De Lin National Taiwan University sdlin@csie.ntu.edu.tw
Pseudocode No The paper describes the model mathematically and textually but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The source code of the proposed model can be downloaded here 6. 6 https://github.com/ntumslab/PRUNE
Open Datasets Yes (I) Hep-Ph 7. It is a paper citation network from 1993 to 2003, including 34, 546 papers and 421, 578 citations relationships. Following the same setup as [25], we leave citations before 1999 for embedding generation, and then evaluate paper ranks using the number of citations after 2000. (II) Webspam 8. It is a web page network used in Webspam Challenges. There are 114, 529 web pages and 1, 836, 441 hyperlinks. Participants are challenged to build a model to rank the 1, 933 labeled non-spam web pages higher than 122 labeled spam ones. (III) FB Wall Post 9. Previous task [7] aims at ranking active users using a 63, 731-user, 831, 401-link wall post network in social media website Facebook, New Orlean 2009. The nodes denote users and a link implies that a user posts at least an article on someone s wall. 14, 862 users are marked active, that is, they continue to post articles in the next three weeks after a certain date.
Dataset Splits No The paper states 'We train on 80% and evaluate on 20% of datasets.' and 'We only observe 80% nodes while training and predict the labels of remaining 20% nodes.' and 'We randomly split network edges into 80%-20% train-test subsets'. This indicates a train/test split but no separate validation split is explicitly mentioned.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions activation functions (ELU, softplus, ReLU) and the Adam optimizer but does not provide specific version numbers for any software libraries, frameworks, or programming languages used (e.g., Python, TensorFlow, PyTorch versions).
Experiment Setup Yes For all experiments, our model fixes node embedding and hidden layers to be 128dimensional, proximity representation to be 64-dimensional. Exponential Linear Unit (ELU) [4] activation is adopted in hidden layers for faster learning, while output layers use softplus activation for node ranking score and Rectified Linear Unit (Re LU) [5] activation for proximity representation to avoid negative-or-zero scores as well as negative representation values. We recommend and fix α = 5, λ = 0.01. All training uses a batch size of 1024 and Adam [9] optimizer with learning rate 0.0001.