Variational Fair Clustering

Authors: Imtiaz Masud Ziko, Jing Yuan, Eric Granger, Ismail Ben Ayed11202-11209

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report comprehensive evaluations and comparisons with state-of-the-art methods over various fair clustering benchmarks, which show that our variational formulation can yield highly competitive solutions in terms of fairness and clustering objectives.In this section, we present comprehensive empirical evaluations of the proposed fair-clustering algorithm, along with comparisons with state-of-the-art fair-clustering techniques.
Researcher Affiliation Academia 1 ETS Montreal, Canada 2 Xidian University, China
Pseudocode Yes Algorithm 1 Proposed Fair-clustering
Open Source Code Yes Code is available at: https://github.com/imtiazziko/Variational Fair-Clustering
Open Datasets Yes We use three datasets from the UCI machine learning repository, one large-scale data set whose demographics are balanced (Census), along with two other data sets with various demographic proportions: Bank 4 dataset...Adult5 is a US census record data set from 1994...Census6 is a large-scale data set corresponding to a US census record data from 1990.
Dataset Splits No The paper mentions using synthetic and real datasets for evaluation and discusses initial partition generation. However, it does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or cross-validation methods).
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU model, CPU type, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific details on ancillary software dependencies, such as programming languages, libraries, or solvers with their version numbers, needed to replicate the experiments.
Experiment Setup Yes In all the experiments, we fixed L = 2...We standardize each dataset by making each feature attribute to have zero mean and unit variance. We then performed L2-normalization of the features, and used the standard K-means++ (Arthur and Vassilvitskii 2007) to generate initial partitions for all the models. For Ncut, we use 20-nearest neighbor affinity matrix, W: w(xp, xq) = 1 if data point xq is within the 20-nearest neighbors of xp, and equal to 0 otherwise.