reproducibilityindex.ai

Refined Learning Bounds for Kernel and Approximate $k$-Means

Authors: Yong Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we validate our theoretical ﬁndings by performing experiments on both simulated and real data.
Researcher Affiliation	Academia	Yong Liu1,2 1Gaoling School of Artiﬁcial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China liuyonggsai@ruc.edu.cn
Pseudocode	Yes	For the completeness, we brieﬂy describe the improved k-means++ in the following, please refer to [25] for more details. 1: If \|C\| < k, add a sampled point x S with probability cost({ψ(x)}, C) P x S cost({ψ(x)}, C), where cost(P, C) = X xi P min c C Φi c , and add ψ(x) to C. 2: If \|C\| k, sample x S with probability cost({ψ(x)},C) P x S cost({ψ(x)},C), check whether there exists a point c C such that cost(S, C\{c} {ψ(x)}) < cost(S, C). If this is the case, we replace c by the point in C that reduces the cost function by the largest amount.
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of its own source code.
Open Datasets	Yes	We use 6 publicly avaiable datasets, dna, segment, mushrooms, mnist, skin-nonskin and covtype, from the LIBSVM Data 2.
Dataset Splits	Yes	We generate Pk i=1 \|Ci\| samples of k clustering centers for training and 10,000 samples for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions).
Experiment Setup	No	The paper mentions using a "Gaussian kernel κ(x, x ) = exp x x 2 /σ2" but does not specify the value of σ (sigma) or any other hyperparameters for the kernel or for Lloyd's algorithm used in the experiments, making it not fully reproducible.