Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Clustering with Tangles: Algorithmic Framework and Theoretical Guarantees

Authors: Solveig Klepper, Christian Elbracht, Diego Fioravanti, Jakob Kneip, Luca Rendsburg, Maximilian Teegen, Ulrike von Luxburg

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our paper we construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases. Python code is available on github. ... Simulations and experiments. To demonstrate the flexibility of the tangle approach, we provide case studies in three different scenarios: a questionnaire scenario in Section 4, a graph clustering scenario in Section 5, and a feature-based scenario in Section 6.
Researcher Affiliation Academia 1 Department of Computer Science and T ubingen AI Center, University of T ubingen, Germany 2 Department of Mathematics University of Hamburg Germany
Pseudocode Yes Algorithm 1: tangle search tree ... Algorithm 2: Generate the initial set of cuts ... Algorithm 3: add orientation to tangle ... Algorithm 4: post-processing the tangle search tree
Open Source Code Yes Python code is available on github. ... Python package. We implemented the central part of the algorithm and different options for preand post-processing. The code and basic examples are publicly available at: https://github.com/tml-tuebingen/tangles/tree/vanilla.
Open Datasets Yes As a simple instance, we chose the Narcissistic Personality Inventory questionnaire (Raskin, 1988), sometimes abbreviated npi in the following. Raskin and Hall developed the test in 1979, and it since then has become one of the most widely utilized personality measures for non-clinical levels of the trait narcissism. The dataset is accessible via https:// openpsychometrics.org/_rawdata/
Dataset Splits No We use our new score sĪ„ to sample a subset of participants that is balanced in terms of the score sĪ„: we randomly sample 18 participants that have score sĪ„ 0, another 18 participants that have score sĪ„ 1, and so forth. This results in a subset of 18 × 40 = 720 participants. ... We average the results over ten random instances of the proposed model.
Hardware Specification No No specific hardware details (GPU/CPU models, memory, etc.) are mentioned for running the experiments. The paper discusses computational complexity and runtime performance generally but without hardware specifics.
Software Dependencies No Python package. We implemented the central part of the algorithm and different options for preand post-processing. ... As a clustering baseline, we apply the k-means algorithm to the answer vectors of the participants, interpreting them as points in a Euclidean space. ... We compare tangles to the k-means clustering algorithm as implemented in sklearn.
Experiment Setup Yes If not stated otherwise, in our algorithmic setup, we use the bipartitions induced by all questions and choose a to be a 1/3 of the size of the smallest cluster. We choose the average Hamming similarity, stated in Equation (4), to assign a cost to the bipartitions. ... We set our agreement parameter a to 150 and prune paths of length one. ... We choose the agreement parameter for the algorithm to be 1/3 of the size of the smallest cluster, which is a rough lower bound. We do not choose a threshold value for Ψ for the tangle algorithm but use all bipartitions generated by the pre-processing. ... We use the slicing Algorithm 2 described in Section 6.0.1. As a cost function, we use cpt A, AAuq ∑ v PA,u PAA ||v u||.