reproducibilityindex.ai

Fair Hierarchical Clustering

Authors: Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii, Yuyan Wang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective. [...] Finally, we conclude with an empirical evaluation of our approach. We show that ignoring protected attributes when performing hierarchical clustering can lead to unfair clusters. On the other hand, adopting the fairlet framework in conjunction with the approximation algorithms we propose yields fair clusters with a negligible objective degradation. [...] This section validates our algorithms from Sections 4 and 5 empirically. [...] We use two datasets from the UCI data repository. [...] We present results for value here, the results for revenue are qualitatively similar. [...] In Table 2, we show for each dataset the ratiovalue both at the time of initialization (Initial) and after using the local search algorithm (Final). [...] In Figure 1(iii), we show the average running time on Census data for both the original average-linkage and the fair average-linkage algorithms.
Researcher Affiliation	Collaboration	Sara Ahmadian Google sahmadian@google.com Alessandro Epasto Google aepasto@google.com Marina Knittel University of Maryland mknittel@cs.umd.edu Ravi Kumar Google ravi.k53@gmail.com Mohammad Mahdian Google mahdian@google.com Benjamin Moseley Carnegie Mellon University moseleyb@andrew.cmu.edu Philip Pham Google phillypham@google.com Sergei Vassilvitskii Google sergeiv@google.com Yuyan Wang Carnegie Mellon University yuyanw@andrew.cmu.edu
Pseudocode	Yes	Algorithm 1 Algorithm for ( /n)-locally-optimal fairlet decomposition. Input: A set V with distance function d 0, parameter , small constant 2 [0, 1]. Output: An -capped fairlet decomposition Y.
Open Source Code	Yes	The code is available in the Supplementary Material.
Open Datasets	Yes	We use two datasets from the UCI data repository.3 [...] 3archive.ics.uci.edu/ml/index.php, Census: archive.ics.uci.edu/ml/datasets/census+ income, Bank: archive.ics.uci.edu/ml/datasets/Bank+Marketing
Dataset Splits	No	The paper mentions sub-sampling the datasets and performing 5 experiments per sample size but does not provide specific train/validation/test splits, percentages, or explicit validation strategies.
Hardware Specification	No	The paper mentions running time experiments on Census data but does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies	No	The paper states that the code is available in the Supplementary Material but does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	We set in Algorithm 1 to 0.1 in all of the experiments. [...] We apply the average-linkage algorithm to create a tree on the fairlets. We further use average-linkage to create subtrees inside of each fairlet. [...] The algorithm selects a random pair of blue or red points in different fairlets to swap, and checks if the swap sufﬁciently improves the objective. We do not run the algorithm until all the pairs are checked, rather the algorithm stops if it has made a 2n failed attempts to swap a random pair.