Expected Probabilistic Hierarchies

Authors: Marcel Kollovieh, Bertrand Charpentier, Daniel Zügner, Stephan Günnemann

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate EPH on synthetic and real-world datasets including vector and graph datasets. EPH outperforms all other approaches quantitatively and provides meaningful hierarchies in qualitative evaluations.
Researcher Affiliation Collaboration Marcel Kollovieh1,2,3 Bertrand Charpentier4 Daniel Zügner5 Stephan Günnemann1,2,3,4 1 School of Computation, Information and Technology, Technical University of Munich 2 Munich Data Science Institute 3 Munich Center for Machine Learning 4 Pruna AI 5 Microsoft Research AI for Science
Pseudocode Yes In the following, we provide a formal description of our EPH algorithm, the subgraph sampling, and how we normalize graphs. Algorithm 1 EPH, Algorithm 2 Normalize Graph, Algorithm 3 Sample Subgraph
Open Source Code Yes Code available at https://www.cs.cit.tum.de/daml/expected-probabilistic-hierarchies
Open Datasets Yes We use the datasets Polblogs [1], Brain [2], Citeseer [34], Genes [11], Cora-ML [25, 5], Open Flight [29], Wiki Physics [3], and DBLP [38]. ... we selected the seven datasets Zoo, Iris, Glass, Digits, Segmentation, Spambase, and Letter from the UCI Machine Learning repository [15]. Furthermore, we also use Cifar-100 [23].
Dataset Splits No Since we are in an unsupervised setting, we have no train/test split, i.e., we train and evaluate on the whole graph.
Hardware Specification Yes While Hyp HC, FPH, and EPH are executed on a GPU (NVIDIA A100), the remaining methods do not support or do not require GPU acceleration.
Software Dependencies No The paper mentions optimizers (PAdamax) and refers to some methods' implementations (e.g., Deep Walk embeddings [32]), but does not specify software library versions (e.g., PyTorch 1.x, Python 3.x, CUDA x.x).
Experiment Setup Yes We use n = 512 internal nodes, compress full hierarchies using the scheme presented by Charpentier and Bonald [8], and use Deep Walk embeddings [32] on the graphs for methods that require features. We train EPH using PAdamax (projected Adamax [21]), reduce the learning rate for B by a factor of 0.1 every 1000 epochs, and reset the probabilistic hierarchy to the so-far best discrete hierarchy. To approximate the expectation of EPH, we use 20 samples, except for Spambase, Letter, and Cifar-100, where we use 10, 1, and 1, respectively, to reduce the runtime. ... Both EPH and FPH are initialized using the average linkage algorithm.