Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Individual Fairness for k-Clustering
Authors: Sepideh Mahabadi, Ali Vakilian
ICML 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we provide an empirical evaluation of our local search based algorithm for α-fair k-clustering with k-median and k-means cost functions. We run experiments on three datasets (Diabetes, Bank, Census) that have been previously used in the context of fair clustering. |
| Researcher Affiliation | Academia | 1TTIC, Chicago, Illinois, USA 2University of Wisconsin, Madison, Wisconsin, USA. Correspondence to: Sepideh Mahabadi <EMAIL>, Ali Vakilian <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Finds a set of critical balls of size at most k; Algorithm 2 local search algorithm w.r.t. critical balls |
| Open Source Code | No | The paper does not provide any concrete access information such as a specific repository link, an explicit statement about code release, or mention of code in supplementary materials for the methodology described. |
| Open Datasets | Yes | We consider three datasets from UCI Machine Learning Repository (2017) which are standard benchmarks for clustering algorithms and in particular they were used in the context of fair k-median clustering in (Chierichetti et al., 2017; Chen et al., 2019; Backurs et al., 2019; Bera et al., 2019; Huang et al., 2019). |
| Dataset Splits | No | The paper mentions using subsets of data for experiments ('we randomly sample a subset of size 1000 points from the data set'), but it does not specify explicit training, validation, or test dataset splits or percentages. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, or specific solvers). |
| Experiment Setup | Yes | In our experiments, we follow the description of Algorithm 1 and 2. The only discrepancy is that instead of considering a point to be covered if it has a center within distance of 6 times its fair radius, in our implementation, we consider a point covered if it has a center within distance of 3 times its fair radius (see line 6 in Algorithm 1). In all experiments, the input parameter α to our local search algorithms (i.e., the desired fairness guarantee) is the fairness approximation 1 η 2 returned by the FAIRKCENTER algorithm of (Jung et al., 2019). Finally, we consider values of k to be in range 5 to 30 with steps of size 5 and draw our plots as a function of k. |