Fair Hierarchical Clustering
Authors: Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii, Yuyan Wang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective. [...] Finally, we conclude with an empirical evaluation of our approach. We show that ignoring protected attributes when performing hierarchical clustering can lead to unfair clusters. On the other hand, adopting the fairlet framework in conjunction with the approximation algorithms we propose yields fair clusters with a negligible objective degradation. [...] This section validates our algorithms from Sections 4 and 5 empirically. [...] We use two datasets from the UCI data repository. [...] We present results for value here, the results for revenue are qualitatively similar. [...] In Table 2, we show for each dataset the ratiovalue both at the time of initialization (Initial) and after using the local search algorithm (Final). [...] In Figure 1(iii), we show the average running time on Census data for both the original average-linkage and the fair average-linkage algorithms. |
| Researcher Affiliation | Collaboration | Sara Ahmadian Google sahmadian@google.com Alessandro Epasto Google aepasto@google.com Marina Knittel University of Maryland mknittel@cs.umd.edu Ravi Kumar Google ravi.k53@gmail.com Mohammad Mahdian Google mahdian@google.com Benjamin Moseley Carnegie Mellon University moseleyb@andrew.cmu.edu Philip Pham Google phillypham@google.com Sergei Vassilvitskii Google sergeiv@google.com Yuyan Wang Carnegie Mellon University yuyanw@andrew.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Algorithm for ( /n)-locally-optimal fairlet decomposition. Input: A set V with distance function d 0, parameter , small constant 2 [0, 1]. Output: An -capped fairlet decomposition Y. |
| Open Source Code | Yes | The code is available in the Supplementary Material. |
| Open Datasets | Yes | We use two datasets from the UCI data repository.3 [...] 3archive.ics.uci.edu/ml/index.php, Census: archive.ics.uci.edu/ml/datasets/census+ income, Bank: archive.ics.uci.edu/ml/datasets/Bank+Marketing |
| Dataset Splits | No | The paper mentions sub-sampling the datasets and performing 5 experiments per sample size but does not provide specific train/validation/test splits, percentages, or explicit validation strategies. |
| Hardware Specification | No | The paper mentions running time experiments on Census data but does not specify any hardware details like GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper states that the code is available in the Supplementary Material but does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We set in Algorithm 1 to 0.1 in all of the experiments. [...] We apply the average-linkage algorithm to create a tree on the fairlets. We further use average-linkage to create subtrees inside of each fairlet. [...] The algorithm selects a random pair of blue or red points in different fairlets to swap, and checks if the swap sufficiently improves the objective. We do not run the algorithm until all the pairs are checked, rather the algorithm stops if it has made a 2n failed attempts to swap a random pair. |