Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fair Hierarchical Clustering

Authors: Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii, Yuyan Wang

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show that our algorithms can find a fair hierarchical clustering, with only a negligible loss in the objective. [...] Finally, we conclude with an empirical evaluation of our approach. We show that ignoring protected attributes when performing hierarchical clustering can lead to unfair clusters. On the other hand, adopting the fairlet framework in conjunction with the approximation algorithms we propose yields fair clusters with a negligible objective degradation. [...] This section validates our algorithms from Sections 4 and 5 empirically. [...] We use two datasets from the UCI data repository. [...] We present results for value here, the results for revenue are qualitatively similar. [...] In Table 2, we show for each dataset the ratiovalue both at the time of initialization (Initial) and after using the local search algorithm (Final). [...] In Figure 1(iii), we show the average running time on Census data for both the original average-linkage and the fair average-linkage algorithms.
Researcher Affiliation Collaboration Sara Ahmadian Google EMAIL Alessandro Epasto Google EMAIL Marina Knittel University of Maryland EMAIL Ravi Kumar Google EMAIL Mohammad Mahdian Google EMAIL Benjamin Moseley Carnegie Mellon University EMAIL Philip Pham Google EMAIL Sergei Vassilvitskii Google EMAIL Yuyan Wang Carnegie Mellon University EMAIL
Pseudocode Yes Algorithm 1 Algorithm for ( /n)-locally-optimal fairlet decomposition. Input: A set V with distance function d 0, parameter , small constant 2 [0, 1]. Output: An -capped fairlet decomposition Y.
Open Source Code Yes The code is available in the Supplementary Material.
Open Datasets Yes We use two datasets from the UCI data repository.3 [...] 3archive.ics.uci.edu/ml/index.php, Census: archive.ics.uci.edu/ml/datasets/census+ income, Bank: archive.ics.uci.edu/ml/datasets/Bank+Marketing
Dataset Splits No The paper mentions sub-sampling the datasets and performing 5 experiments per sample size but does not provide specific train/validation/test splits, percentages, or explicit validation strategies.
Hardware Specification No The paper mentions running time experiments on Census data but does not specify any hardware details like GPU models, CPU types, or memory.
Software Dependencies No The paper states that the code is available in the Supplementary Material but does not list any specific software dependencies with version numbers.
Experiment Setup Yes We set in Algorithm 1 to 0.1 in all of the experiments. [...] We apply the average-linkage algorithm to create a tree on the fairlets. We further use average-linkage to create subtrees inside of each fairlet. [...] The algorithm selects a random pair of blue or red points in different fairlets to swap, and checks if the swap sufficiently improves the objective. We do not run the algorithm until all the pairs are checked, rather the algorithm stops if it has made a 2n failed attempts to swap a random pair.