reproducibilityindex.ai

Proportionally Fair Clustering

Authors: Xingyu Chen, Brandon Fain, Liang Lyu, Kamesh Munagala

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with an empirical examination of the tradeoff between proportional solutions and the k-means objective.
Researcher Affiliation	Academia	Xingyu Chen 1 Brandon Fain 1 Liang Lyu 1 Kamesh Munagala 1 Department of Computer Science, Duke University, Durham, North Carolina, USA.
Pseudocode	Yes	Algorithm 1 Greedy Capture; Algorithm 2 Local Capture Heuristic
Open Source Code	No	The paper does not contain an explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	In this section, we study proportionality on real data taken from the UCI Machine Learning Repository (Dheeru & Karra Taniskidou, 2017). We consider three qualitatively different data sets used for clustering: Iris, Diabetes, and KDD.
Dataset Splits	No	The paper mentions using specific datasets (Iris, Diabetes, KDD) but does not provide explicit details on training, validation, or test splits. It only states "we work with a subsample of 100,000 points" for KDD and further sampling for Local Capture algorithm: "sampling 5,000 points uniformly at random to treat as N and sampling 400 points via the k-means++ initialization to treat as M".
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers (e.g., library names with versions) that would be needed to replicate the experiment environment.
Experiment Setup	Yes	In our experiments (see Section 5), we search for the minimum ρ for which the algorithm terminates in a small number of iterations via binary search over possible input of ρ. For efﬁciency, we run our Local Capture algorithm by further sampling 5,000 points uniformly at random to treat as N and sampling 400 points via the k-means++ initialization to treat as M. The k-means objective is measured on the original 100,000 points for both algorithms.