reproducibilityindex.ai

DeepWalking Backwards: From Embeddings Back to Graphs

Authors: Sudhanshu Chanpuriya, Cameron Musco, Konstantinos Sotiropoulos, Charalampos Tsourakakis

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform numerous experiments on real-world networks, observing that signiﬁcant information about G, such as speciﬁc edges and bulk properties like triangle density, is often lost in G. However, community structure is often preserved or even enhanced. Our ﬁndings are a step towards a more rigorous understanding of exactly what information embeddings encode about the input graph, and why this information is useful for learning tasks.
Researcher Affiliation	Academia	1College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA. 2Department of Computer Science, Boston University, Boston, MA, USA. 3ISI Foundation, Turin, Italy. Correspondence to: Sudhanshu Chanpuriya <umass>.
Pseudocode	Yes	The overall procedure is given in Algorithm 1. Pseudocode is given in Algorithm 2.
Open Source Code	Yes	All code is written in Python and is available at https://github.com/konsotirop/Invert_ Embeddings.
Open Datasets	Yes	All datasets we use are publicly available: see Qiu et al. (2018) for BLOGCATALOG and PPI, Sen et al. (2008) for CITESEER and CORA, and SNAP (Leskovec & Krevl, 2014) for EMAIL and YOUTUBE.
Dataset Splits	No	The paper mentions sampling a portion for training and inferring labels for the remaining nodes, but does not explicitly define a separate validation split or its size.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models or cloud instance types.
Software Dependencies	No	Our implementation uses Py Torch (Paszke et al., 2019) for automatic differentiation and minimizes the loss using the Sci Py (Jones et al., 2001) implementation of the L-BFGS (Liu & Nocedal, 1989; Zhu et al., 1997) algorithm with default hyperparameters and up to a maximum of 500 iterations. (No version numbers for PyTorch or SciPy are given.)
Experiment Setup	Yes	Hyperparameter settings. We experiment with a set of different values for the embedding dimension k, starting from 24 and incrementing in powers of 2, up to 211 = 2048, except for the EMAIL dataset, which has fewer than 210 nodes. For this dataset we only test for k up to 29. Throughout the experiments, we set the window-size T to 10, as this is the most commonly used value in downstream machine learning tasks.