Refined Learning Bounds for Kernel and Approximate $k$-Means

Authors: Yong Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we validate our theoretical findings by performing experiments on both simulated and real data.
Researcher Affiliation Academia Yong Liu1,2 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China liuyonggsai@ruc.edu.cn
Pseudocode Yes For the completeness, we briefly describe the improved k-means++ in the following, please refer to [25] for more details. 1: If |C| < k, add a sampled point x S with probability cost({ψ(x)}, C) P x S cost({ψ(x)}, C), where cost(P, C) = X xi P min c C Φi c , and add ψ(x) to C. 2: If |C| k, sample x S with probability cost({ψ(x)},C) P x S cost({ψ(x)},C), check whether there exists a point c C such that cost(S, C\{c} {ψ(x)}) < cost(S, C). If this is the case, we replace c by the point in C that reduces the cost function by the largest amount.
Open Source Code No The paper does not provide a direct link or explicit statement about the availability of its own source code.
Open Datasets Yes We use 6 publicly avaiable datasets, dna, segment, mushrooms, mnist, skin-nonskin and covtype, from the LIBSVM Data 2.
Dataset Splits Yes We generate Pk i=1 |Ci| samples of k clustering centers for training and 10,000 samples for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions).
Experiment Setup No The paper mentions using a "Gaussian kernel κ(x, x ) = exp x x 2 /σ2" but does not specify the value of σ (sigma) or any other hyperparameters for the kernel or for Lloyd's algorithm used in the experiments, making it not fully reproducible.