Approximate Group Fairness for Clustering

Authors: Bo Li, Lijun Li, Ankang Sun, Chenhao Wang, Yingfan Wang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments Finally, in Section 6, we conduct experiments to examine the performance of our algorithms. We note that our algorithms have good theoretical guarantees in the worst case, but they may not find the fairest clustering for every instance. Accordingly, we first propose a twostage algorithm to refine the clusters and then use synthetic and real-world data sets to show how much it outperforms classic ones regarding core fairness.
Researcher Affiliation Academia 1Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China 2School of Mathematical Sciences, Ocean University of China, Qingdao, China 3Warwick Business School, University of Warwick, United Kingdom 4University of Nebraska-Lincoln, United States 5Department of Computer Science, Duke University, United States.
Pseudocode Yes Algorithm 1 ALGl(λ) for Line. ... Algorithm 2 ALGt(λ) for Tree. ... Algorithm 3 ALGg for General Metric Space. ... Algorithm 4 ALG+ g (obj) for General Metric Space.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets Yes (2) Mopsi locations in clustering benchmark datasets (Fr anti and Sieranoja, 2018) (real-world): a set of 2-D locations for n = 6014 users in Joensuu. ... Pasi Fr anti and Sami Sieranoja. 2018. K-means properties on six clustering benchmark datasets. Appl. Intell. 48, 12 (2018), 4743 4759. http://cs.uef.fi/sipu/ datasets/
Dataset Splits No The paper describes the datasets used and the range of k values for clustering, but it does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper describes the algorithms and experiments but does not provide any specific details about the hardware specifications (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions implementing algorithms like k-means++ but does not specify any software dependencies or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication.
Experiment Setup Yes For a range of k = 8, . . . , 17 (horizontal axis), (c) and (d) (resp. (e) and (f)) compare the fairness and efficiency in Gaussian dataset (resp. Mopsi locations). We want to build k = 10 centers to serve the nodes;