reproducibilityindex.ai

Generalized Reductions: Making any Hierarchical Clustering Fair and Balanced with Low Cost

Authors: Marina Knittel, Max Springer, John P Dickerson, Mohammadtaghi Hajiaghayi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our results using Dasgupta s cost function, perhaps one of the most prevalent theoretical metrics for hierarchical clustering evaluation. Our work vastly improves the previous O(n5/6poly log(n)) fair approximation for cost to a near polylogarithmic O(nδpoly log(n)) fair approximation for any constant δ (0, 1). This result establishes a costfairness tradeoff and extends to broader fairness constraints than the previous work. We also show how to alter existing hierarchical clusterings to guarantee fairness and cluster balance across any level in the hierarchy. ... Section 5. Experiments. This section validates our algorithms from Section 4. Our simulations demonstrate that our algorithm incurs only a modest loss in the hierarchical clustering objective and exhibits increased fairness. Specifically, the approximate cost increases as a function of Algorithm 3 s defining parameters: c, δ, and k. ... We additionally illustrate the resulting balance of our hierarchical clustering algorithm by presenting the distribution of the cluster ratios of the projected features (blue to red points) in Figure 5 for the Census data.
Researcher Affiliation	Academia	1Department of Computer Science, University of Maryland, College Park, USA. Correspondence to: Marina Knittel <mknittel@umd.edu>, Max Springer <mss423@umd.edu>, John Dickerson <john@cs.umd.edu>, Mohammad Hajiaghayi <hajiagha@cs.umd.edu>.
Pseudocode	Yes	Algorithm 1 Rebalance Tree; Algorithm 2 Refine Rebalance Tree; Algorithm 3 Fair HC; Algorithm 4 Subtree Search
Open Source Code	Yes	Implementation. The Python code for the following experiments are available in the Supplementary Material.
Open Datasets	Yes	Datasets. We use two data sets, Census and Bank, from the UCI data repository (Dua & Graff, 2017).
Dataset Splits	No	The paper does not provide specific training, validation, or test dataset splits with percentages, counts, or references to predefined splits.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'Python code' but does not specify any software names with version numbers for libraries or frameworks used in the experiments.
Experiment Setup	Yes	We vary the parameters c {2i}5 i=0, δ ( 1/8), and k {2i}4 i=1 to experimentally validate their theoretical impact on the approximate guarantees of Section 4. ... Parameters were set to c = 4, δ = 3/8, k = 4 for the above clustering result.