Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Clustering with Tangles: Algorithmic Framework and Theoretical Guarantees

Authors: Solveig Klepper, Christian Elbracht, Diego Fioravanti, Jakob Kneip, Luca Rendsburg, Maximilian Teegen, Ulrike von Luxburg

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our paper we construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases. Python code is available on github. ... Simulations and experiments. To demonstrate the flexibility of the tangle approach, we provide case studies in three different scenarios: a questionnaire scenario in Section 4, a graph clustering scenario in Section 5, and a feature-based scenario in Section 6.
Researcher Affiliation	Academia	1 Department of Computer Science and T ubingen AI Center, University of T ubingen, Germany 2 Department of Mathematics University of Hamburg Germany
Pseudocode	Yes	Algorithm 1: tangle search tree ... Algorithm 2: Generate the initial set of cuts ... Algorithm 3: add orientation to tangle ... Algorithm 4: post-processing the tangle search tree
Open Source Code	Yes	Python code is available on github. ... Python package. We implemented the central part of the algorithm and different options for preand post-processing. The code and basic examples are publicly available at: https://github.com/tml-tuebingen/tangles/tree/vanilla.
Open Datasets	Yes	As a simple instance, we chose the Narcissistic Personality Inventory questionnaire (Raskin, 1988), sometimes abbreviated npi in the following. Raskin and Hall developed the test in 1979, and it since then has become one of the most widely utilized personality measures for non-clinical levels of the trait narcissism. The dataset is accessible via https:// openpsychometrics.org/_rawdata/
Dataset Splits	No	We use our new score sτ to sample a subset of participants that is balanced in terms of the score sτ: we randomly sample 18 participants that have score sτ 0, another 18 participants that have score sτ 1, and so forth. This results in a subset of 18 × 40 = 720 participants. ... We average the results over ten random instances of the proposed model.
Hardware Specification	No	No specific hardware details (GPU/CPU models, memory, etc.) are mentioned for running the experiments. The paper discusses computational complexity and runtime performance generally but without hardware specifics.
Software Dependencies	No	Python package. We implemented the central part of the algorithm and different options for preand post-processing. ... As a clustering baseline, we apply the k-means algorithm to the answer vectors of the participants, interpreting them as points in a Euclidean space. ... We compare tangles to the k-means clustering algorithm as implemented in sklearn.
Experiment Setup	Yes	If not stated otherwise, in our algorithmic setup, we use the bipartitions induced by all questions and choose a to be a 1/3 of the size of the smallest cluster. We choose the average Hamming similarity, stated in Equation (4), to assign a cost to the bipartitions. ... We set our agreement parameter a to 150 and prune paths of length one. ... We choose the agreement parameter for the algorithm to be 1/3 of the size of the smallest cluster, which is a rough lower bound. We do not choose a threshold value for Ψ for the tangle algorithm but use all bipartitions generated by the pre-processing. ... We use the slicing Algorithm 2 described in Section 6.0.1. As a cost function, we use cpt A, AAuq ∑ v PA,u PAA \|\|v u\|\|.