Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Clustering what Matters: Optimal Approximation for Clustering with Outliers

Authors: Akanksha Agrawal, Tanmay Inamdar, Saket Saurabh, Jie Xue

JAIR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we give a general approach for solving clustering with outliers, which results in a fixed-parameter tractable (FPT) algorithm in k and m i.e., an algorithm with running time of the form f(k, m) n O(1) for some function f that almost matches the approximation ratio for its outlier-free counterpart. As a corollary, we obtain FPT approximation algorithms with optimal approximation ratios for k-Median and k Means with outliers in general and Euclidean metrics.
Researcher Affiliation	Academia	Akanksha Agrawal EMAIL Indian Institute Of Technology, Chennai 600036, India Tanmay Inamdar EMAIL University of Bergen, Bergen 5007, Norway Saket Saurabh EMAIL Institute of Mathematical Sciences, 4th Cross St, Chennai 600113, India University of Bergen, Bergen 5007, Norway Jie Xue EMAIL NYU Shanghai, 1555 Shiji Blvd, Pudong, Shanghai, China, 200122a
Pseudocode	No	Figure 1: A conceptual overview of our approximation-preserving reduction for k Median Out to k-Median. Here, A denotes a β-approximation algorithm for k-Median, which we use as a black box on each of the instances I1, . . . , It of k-Median.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of source code for the methodology described.
Open Datasets	No	The paper describes theoretical algorithms and their approximation ratios for clustering problems. It does not present any experimental evaluation using specific datasets or provide access information for open datasets.
Dataset Splits	No	The paper does not present experimental results or use specific datasets, therefore, there is no mention of training/test/validation splits.
Hardware Specification	No	The paper focuses on theoretical algorithm design and analysis, and thus does not describe the hardware used for experimental evaluation.
Software Dependencies	No	The paper focuses on theoretical algorithm design and analysis, and thus does not mention any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical in nature and does not describe an experimental setup, hyperparameters, or training configurations.