Proportionally Fair Clustering
Authors: Xingyu Chen, Brandon Fain, Liang Lyu, Kamesh Munagala
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with an empirical examination of the tradeoff between proportional solutions and the k-means objective. |
| Researcher Affiliation | Academia | Xingyu Chen 1 Brandon Fain 1 Liang Lyu 1 Kamesh Munagala 1 Department of Computer Science, Duke University, Durham, North Carolina, USA. |
| Pseudocode | Yes | Algorithm 1 Greedy Capture; Algorithm 2 Local Capture Heuristic |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | In this section, we study proportionality on real data taken from the UCI Machine Learning Repository (Dheeru & Karra Taniskidou, 2017). We consider three qualitatively different data sets used for clustering: Iris, Diabetes, and KDD. |
| Dataset Splits | No | The paper mentions using specific datasets (Iris, Diabetes, KDD) but does not provide explicit details on training, validation, or test splits. It only states "we work with a subsample of 100,000 points" for KDD and further sampling for Local Capture algorithm: "sampling 5,000 points uniformly at random to treat as N and sampling 400 points via the k-means++ initialization to treat as M". |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., library names with versions) that would be needed to replicate the experiment environment. |
| Experiment Setup | Yes | In our experiments (see Section 5), we search for the minimum ρ for which the algorithm terminates in a small number of iterations via binary search over possible input of ρ. For efficiency, we run our Local Capture algorithm by further sampling 5,000 points uniformly at random to treat as N and sampling 400 points via the k-means++ initialization to treat as M. The k-means objective is measured on the original 100,000 points for both algorithms. |