Nearly-Tight and Oblivious Algorithms for Explainable Clustering
Authors: Buddhima Gamlath, Xinrui Jia, Adam Polak, Ola Svensson
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study the problem of explainable clustering in the setting first formalized by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). A k-clustering is said to be explainable if it is given by a decision tree where each internal node splits data points with a threshold cut in a single dimension (feature), and each of the k leaves corresponds to a cluster. We give an algorithm that outputs an explainable clustering that loses at most a factor of O(log2 k) compared to an optimal (not necessarily explainable) clustering for the k-medians objective, and a factor of O(k log2 k) for the k-means objective. This improves over the previous best upper bounds of O(k) and O(k2), respectively, and nearly matches the previous Ω(log k) lower bound for k-medians and our new Ω(k) lower bound for k-means. The algorithm is remarkably simple. In particular, given an initial not necessarily explainable clustering in Rd, it is oblivious to the data points and runs in time O(dk log2 k), independent of the number of data points n. Our upper and lower bounds also generalize to objectives given by higher ℓp-norms. |
| Researcher Affiliation | Academia | Buddhima Gamlath EPFL buddhima.gamlath@epfl.ch Xinrui Jia EPFL xinrui.jia@epfl.ch Adam Polak EPFL adam.polak@epfl.ch Ola Svensson EPFL ola.svensson@epfl.ch |
| Pseudocode | Yes | Algorithm 1: Explainable k-medians algorithm. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper is theoretical and does not describe experiments that use datasets for training. It refers to "data points" in a theoretical context but not in the context of specific datasets or empirical training. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and analysis. It does not mention any hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and describes algorithms and proofs. It does not list any specific software dependencies with version numbers for implementation or execution of experiments. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details such as hyperparameters or specific training configurations. |