Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Normed Spaces for Graph Embedding

Authors: Diaaeldin Taha, Wei Zhao, J. Maxwell Riestenberg, Michael Strube

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the representational capacity of normed spaces on synthetic and real-world benchmark graph datasets through a graph reconstruction task. Our empirical results corroborate the theoretical motivation; as observed in our experiments, diverse classes of graphs with varying structures can be embedded in low-dimensional normed spaces with low average distortion. Second, we find that normed spaces consistently outperform Euclidean spaces, hyperbolic spaces, Cartesian products of these spaces, Siegel spaces, and spaces of SPD matrices across test setups. Further empirical analysis shows that the embedding capacity of normed spaces remains robust across varying graph curvatures and with increasing graph sizes. Moreover, the computational resource requirements for normed spaces grow much slower than other Riemannian manifold alternatives as the graph size increases. Lastly, we showcase the versatility of normed spaces in two applied graph embedding tasks, namely, link prediction and recommender systems, with the â„“1 normed space surpassing the baseline spaces.
Researcher Affiliation Academia Diaaeldin Taha EMAIL Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Wei Zhao EMAIL University of Aberdeen, Aberdeen, United Kingdom J. Maxwell Riestenberg EMAIL Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Michael Strube EMAIL Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
Pseudocode No The paper describes methodologies and mathematical frameworks, but it does not include any explicit pseudocode blocks or algorithms labeled as such. All procedural descriptions are in paragraph form.
Open Source Code Yes We make our code and data publically available 1. 1https://github.com/andyweizhao/graphs-normed-spaces
Open Datasets Yes We evaluate the representational capacity of metric spaces on five popular real-world graph networks. These include (a) USCA312 (Hahsler & Hornik, 2007) and Euro Road (Ĺ ubelj & Bajec, 2011), representing North American city networks and European road systems respectively; (b) bio-diseasome (Goh et al., 2007), a biological graph representing the relationships between human disorder and diseases and their genetic origins; (c) CSPHD (Nooy et al., 2011), a graph of Ph.D. advisor-advisee relationships in computer science and (d) Facebook (Mc Auley & Leskovec, 2012), a dense social network from Facebook.
Dataset Splits Yes We split each dataset into train, development and test sets corresponding to 70%, 10%, 20% of citation links that we sample at random. We report the average performance in AUC across five runs. We provide the training details in Appendix D.2. ... We use the train/dev/test sets of these datasets from the work of LĂłpez et al. (2021), and report the average results across five runs in terms of two evaluation metrics: hit ratio (HR) and normalized discounted cumulative gain (n DG), both at 10. We provide the training details and the statistics of the graphs in Appendix D.3.
Hardware Specification Yes All experiments were executed on an Intel(R) Xeon(R) CPU E5-2650 computer, equipped with 48 CPUs operating at 2.2 GHz and a single Tesla P40 GPU with a 24GB of memory running on CUDA 11.2.
Software Dependencies Yes All experiments were executed on an Intel(R) Xeon(R) CPU E5-2650 computer, equipped with 48 CPUs operating at 2.2 GHz and a single Tesla P40 GPU with a 24GB of memory running on CUDA 11.2. The implementation of all baselines are taken from Geoopt (Kochurov et al., 2020) and LĂłpez et al. (2021).
Experiment Setup Yes In all setups, we use the RAdam optimizer (Bécigneul & Ganea, 2019), and run the same grid search to to train graph embeddings. The implementation of all baselines are taken from Geoopt (Kochurov et al., 2020) and López et al. (2021). We train for 3000 epochs, and stop training when the average distortion has not decreased for 200 epochs. We experiment with three hyperparameters: (a) learning rate {0.1, 0.01, 0.001}; (b) batch size {512, 1024, 2048, 1} with 1 as the node count within a graph and (c) maximum gradient norm {10, 50, 250}.