Fair, Polylog-Approximate Low-Cost Hierarchical Clustering

Authors: Marina Knittel, Max Springer, John Dickerson, MohammadTaghi Hajiaghayi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section validates the theoretical guarantees of Algorithm 2. Specifically, we demonstrate that modifying an unfair hierarchical clustering using the presented procedure yields a fair hierarchy that incurs only a modest increase in cost. ... In Figure 3, we depict the cluster balances of an unfair hierarchical clustering algorithm, namely average-linkage, and subsequently demonstrate that our algorithm effectively concentrates all clusters around the underlying data balance.
Researcher Affiliation Academia Marina Knittel Max Springer John Dickerson Mohammad Taghi Hajiaghayi Department of Computer Science University of Maryland, College Park {mknittel,mss423}@umd.edu {john,hajiagha}@cs.umd.edu
Pseudocode Yes Algorithm 1 Split Root" and "Algorithm 2 Make Fair
Open Source Code Yes The Python code for the following experiments are available in the Supplementary Material.
Open Datasets Yes We use two data sets, Census and Bank, from the UCI data repository Dua and Graff [2017].
Dataset Splits No The paper mentions using samples of various sizes and conducting independent replications, but does not specify explicit training, validation, or test dataset splits, exact percentages, or sample counts for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions that Python code is available in the supplementary material, but it does not specify exact version numbers for Python or any other key software libraries, solvers, or dependencies used in the experiments.
Experiment Setup Yes We vary the parameters (c, h, k, ")1 to experimentally assess their theoretical impact on the approximate guarantees of Section 3. ... Figure 3 depicts: (B) the resultant cluster balances after running Algorithm 2 with parameters (c, h, k, ") = (8, 4, 2, 1/c log2 n), (C) cluster balances after tuning c = 4, (D) cluster balances after further tuning c = 2.