Proportionally Fair Clustering

Authors: Xingyu Chen, Brandon Fain, Liang Lyu, Kamesh Munagala

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conclude with an empirical examination of the tradeoff between proportional solutions and the k-means objective.
Researcher Affiliation Academia Xingyu Chen 1 Brandon Fain 1 Liang Lyu 1 Kamesh Munagala 1 Department of Computer Science, Duke University, Durham, North Carolina, USA.
Pseudocode Yes Algorithm 1 Greedy Capture; Algorithm 2 Local Capture Heuristic
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets Yes In this section, we study proportionality on real data taken from the UCI Machine Learning Repository (Dheeru & Karra Taniskidou, 2017). We consider three qualitatively different data sets used for clustering: Iris, Diabetes, and KDD.
Dataset Splits No The paper mentions using specific datasets (Iris, Diabetes, KDD) but does not provide explicit details on training, validation, or test splits. It only states "we work with a subsample of 100,000 points" for KDD and further sampling for Local Capture algorithm: "sampling 5,000 points uniformly at random to treat as N and sampling 400 points via the k-means++ initialization to treat as M".
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., library names with versions) that would be needed to replicate the experiment environment.
Experiment Setup Yes In our experiments (see Section 5), we search for the minimum ρ for which the algorithm terminates in a small number of iterations via binary search over possible input of ρ. For efficiency, we run our Local Capture algorithm by further sampling 5,000 points uniformly at random to treat as N and sampling 400 points via the k-means++ initialization to treat as M. The k-means objective is measured on the original 100,000 points for both algorithms.