DeepWalking Backwards: From Embeddings Back to Graphs

Authors: Sudhanshu Chanpuriya, Cameron Musco, Konstantinos Sotiropoulos, Charalampos Tsourakakis

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform numerous experiments on real-world networks, observing that significant information about G, such as specific edges and bulk properties like triangle density, is often lost in G. However, community structure is often preserved or even enhanced. Our findings are a step towards a more rigorous understanding of exactly what information embeddings encode about the input graph, and why this information is useful for learning tasks.
Researcher Affiliation Academia 1College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA. 2Department of Computer Science, Boston University, Boston, MA, USA. 3ISI Foundation, Turin, Italy. Correspondence to: Sudhanshu Chanpuriya <umass>.
Pseudocode Yes The overall procedure is given in Algorithm 1. Pseudocode is given in Algorithm 2.
Open Source Code Yes All code is written in Python and is available at https://github.com/konsotirop/Invert_ Embeddings.
Open Datasets Yes All datasets we use are publicly available: see Qiu et al. (2018) for BLOGCATALOG and PPI, Sen et al. (2008) for CITESEER and CORA, and SNAP (Leskovec & Krevl, 2014) for EMAIL and YOUTUBE.
Dataset Splits No The paper mentions sampling a portion for training and inferring labels for the remaining nodes, but does not explicitly define a separate validation split or its size.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models or cloud instance types.
Software Dependencies No Our implementation uses Py Torch (Paszke et al., 2019) for automatic differentiation and minimizes the loss using the Sci Py (Jones et al., 2001) implementation of the L-BFGS (Liu & Nocedal, 1989; Zhu et al., 1997) algorithm with default hyperparameters and up to a maximum of 500 iterations. (No version numbers for PyTorch or SciPy are given.)
Experiment Setup Yes Hyperparameter settings. We experiment with a set of different values for the embedding dimension k, starting from 24 and incrementing in powers of 2, up to 211 = 2048, except for the EMAIL dataset, which has fewer than 210 nodes. For this dataset we only test for k up to 29. Throughout the experiments, we set the window-size T to 10, as this is the most commonly used value in downstream machine learning tasks.