Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Directed Probabilistic Watershed

Authors: Enrique Fita Sanmartin, Sebastian Damrich, Fred A. Hamprecht

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we run an illustrative experiment to show the performance of the DProb WS on node classiﬁcation. ... We compare the DProb WS with the methods exposed in [9, 11, 49] referred as ARW, GTG and LLUD respectively.
Researcher Affiliation	Academia	Enrique Fita Sanmartín, Sebastian Damrich, Fred A. Hamprecht HCI/IWR at Heidelberg University, 69120 Heidelberg, Germany {enrique.fita.sanmartin, sebastian.damrich, fred.hamprecht} @iwr.uni-heidelberg.de
Pseudocode	Yes	Algorithm 1: DProb WS
Open Source Code	Yes	Code publicly available at https://github.com/hci-unihd/Directed_Probabilistic_ Watershed.git
Open Datasets	Yes	We construct k NN graphs, with k = 5, from the UCI datasets [10] Digits [44] and 20Newsgroups[23]. Additionally we consider the Email-EUnetwork [26, 27, 46], the Cora network[29] and Citeseer X network[33].
Dataset Splits	No	The paper describes how labeled nodes are sampled as seeds ('sampling a certain fraction r of all nodes from each class uniformly as seeds') but does not specify distinct training, validation, and test splits or a cross-validation setup.
Hardware Specification	No	The paper does not specify the hardware used for the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	We construct k NN graphs, with k = 5... Inspired by [9], we sample a certain fraction r of all nodes from each class uniformly as seeds. In Figure 3, we show the average accuracy over 20 runs for each of the r values between 0.1 and 0.9.