reproducibilityindex.ai

Nyström Method with Kernel K-means++ Samples as Landmarks

Authors: Dino Oglic, Thomas Gärtner

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We substantiate our theoretical results empirically by comparing the approach to several state-of-the-art algorithms. The results of our empirical study are presented in Section 5 and indicate a superior performance of the proposed approach over competing methods. In the first set of experiments, we fix the approximation rank and evaluate the performance of the landmark selection strategies while varying the bandwidth of the Gaussian kernel.
Researcher Affiliation	Academia	1Institut f ur Informatik III, Universit at Bonn, Germany 2School of Computer Science, The University of Nottingham, United Kingdom.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release) for the source code of the described methodology.
Open Datasets	Yes	The experiments were performed on 13 real-world datasets available at the UCI and LIACC repositories.
Dataset Splits	No	The paper does not explicitly provide specific training, validation, or test dataset splits needed to reproduce data partitioning, as the focus is on matrix approximation rather than typical supervised learning model training.
Hardware Specification	No	The paper only mentions access to a 'University of Nottingham High Performance Computing Facility' but does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	We measure the goodness of a landmark selection strategy with the lift of the approximation error in the Frobenius norm and the time needed to select the landmarks. The approximation rank is fixed to K = 100. We refer to γ = 1/σ2 as the bandwidth of the Gaussian kernel and in order to determine the bandwidth interval we sample 5 000 instances and compute their squared pairwise distances. From these distances we take the inverse of 1 and 99 percentile values as the right and left endpoints. To force the kernel matrix to have a large number of signiﬁcant spectral components (i.e., the Gaussian kernel matrix approaches to the identity matrix), we require the right bandwidth endpoint to be at least 1. From the logspace of the determined interval we choose 10 evenly spaced values as bandwidth parameters.