reproducibilityindex.ai

Variational Fair Clustering

Authors: Imtiaz Masud Ziko, Jing Yuan, Eric Granger, Ismail Ben Ayed11202-11209

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report comprehensive evaluations and comparisons with state-of-the-art methods over various fair clustering benchmarks, which show that our variational formulation can yield highly competitive solutions in terms of fairness and clustering objectives.In this section, we present comprehensive empirical evaluations of the proposed fair-clustering algorithm, along with comparisons with state-of-the-art fair-clustering techniques.
Researcher Affiliation	Academia	1 ETS Montreal, Canada 2 Xidian University, China
Pseudocode	Yes	Algorithm 1 Proposed Fair-clustering
Open Source Code	Yes	Code is available at: https://github.com/imtiazziko/Variational Fair-Clustering
Open Datasets	Yes	We use three datasets from the UCI machine learning repository, one large-scale data set whose demographics are balanced (Census), along with two other data sets with various demographic proportions: Bank 4 dataset...Adult5 is a US census record data set from 1994...Census6 is a large-scale data set corresponding to a US census record data from 1990.
Dataset Splits	No	The paper mentions using synthetic and real datasets for evaluation and discusses initial partition generation. However, it does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or cross-validation methods).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU model, CPU type, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific details on ancillary software dependencies, such as programming languages, libraries, or solvers with their version numbers, needed to replicate the experiments.
Experiment Setup	Yes	In all the experiments, we ﬁxed L = 2...We standardize each dataset by making each feature attribute to have zero mean and unit variance. We then performed L2-normalization of the features, and used the standard K-means++ (Arthur and Vassilvitskii 2007) to generate initial partitions for all the models. For Ncut, we use 20-nearest neighbor afﬁnity matrix, W: w(xp, xq) = 1 if data point xq is within the 20-nearest neighbors of xp, and equal to 0 otherwise.