Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Expected Probabilistic Hierarchies
Authors: Marcel Kollovieh, Bertrand Charpentier, Daniel Zügner, Stephan Günnemann
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate EPH on synthetic and real-world datasets including vector and graph datasets. EPH outperforms all other approaches quantitatively and provides meaningful hierarchies in qualitative evaluations. |
| Researcher Affiliation | Collaboration | Marcel Kollovieh1,2,3 Bertrand Charpentier4 Daniel Zügner5 Stephan Günnemann1,2,3,4 1 School of Computation, Information and Technology, Technical University of Munich 2 Munich Data Science Institute 3 Munich Center for Machine Learning 4 Pruna AI 5 Microsoft Research AI for Science |
| Pseudocode | Yes | In the following, we provide a formal description of our EPH algorithm, the subgraph sampling, and how we normalize graphs. Algorithm 1 EPH, Algorithm 2 Normalize Graph, Algorithm 3 Sample Subgraph |
| Open Source Code | Yes | Code available at https://www.cs.cit.tum.de/daml/expected-probabilistic-hierarchies |
| Open Datasets | Yes | We use the datasets Polblogs [1], Brain [2], Citeseer [34], Genes [11], Cora-ML [25, 5], Open Flight [29], Wiki Physics [3], and DBLP [38]. ... we selected the seven datasets Zoo, Iris, Glass, Digits, Segmentation, Spambase, and Letter from the UCI Machine Learning repository [15]. Furthermore, we also use Cifar-100 [23]. |
| Dataset Splits | No | Since we are in an unsupervised setting, we have no train/test split, i.e., we train and evaluate on the whole graph. |
| Hardware Specification | Yes | While Hyp HC, FPH, and EPH are executed on a GPU (NVIDIA A100), the remaining methods do not support or do not require GPU acceleration. |
| Software Dependencies | No | The paper mentions optimizers (PAdamax) and refers to some methods' implementations (e.g., Deep Walk embeddings [32]), but does not specify software library versions (e.g., PyTorch 1.x, Python 3.x, CUDA x.x). |
| Experiment Setup | Yes | We use n = 512 internal nodes, compress full hierarchies using the scheme presented by Charpentier and Bonald [8], and use Deep Walk embeddings [32] on the graphs for methods that require features. We train EPH using PAdamax (projected Adamax [21]), reduce the learning rate for B by a factor of 0.1 every 1000 epochs, and reset the probabilistic hierarchy to the so-far best discrete hierarchy. To approximate the expectation of EPH, we use 20 samples, except for Spambase, Letter, and Cifar-100, where we use 10, 1, and 1, respectively, to reduce the runtime. ... Both EPH and FPH are initialized using the average linkage algorithm. |