Rehashing Kernel Evaluation in High Dimensions

Authors: Paris Siminelakis, Kexin Rong, Peter Bailis, Moses Charikar, Philip Levis

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that these new tools offer up to 10 improvement in evaluation time on a range of synthetic and real-world datasets.
Researcher Affiliation Academia Paris Siminelakis * 1 Kexin Rong * 1 Peter Bailis 1 Moses Charikar 1 Philip Levis 1 1Stanford University, Stanford, California, US. Correspondence to: Paris Siminelakis <psimin@stanford.edu>, Kexin Rong <krong@stanford.edu>.
Pseudocode Yes Algorithm 1 Data-dependent Diagnostic
Open Source Code Yes Source code available at: http://github.com/kexinrong/rehashing
Open Datasets Yes We repeat the above experiments on eight large real-world datasets from various domains... The sources of the datasets are: MSD (Bertin-Mahieux et al., 2011), Glo Ve (Pennington et al., 2014), SVHN (Netzer et al., 2011), TMY3 (Hendron & Engebrecht, 2010), covtype (Blackard & Dean, 1999), TIMIT (Garofolo, 1993).
Dataset Splits No No explicit description of dataset splits (e.g., specific percentages for training, validation, or test sets, or cross-validation methodology) was found.
Hardware Specification No The paper states 'all implementations are in C++ and we report results on a single core' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions 'implementations are in C++' and 'open sourced libraries for Fig Tree and ASKIT' but does not specify version numbers for any software dependencies.
Experiment Setup Yes We tune each set of parameters via binary search to guarantee an average relative error of at most 0.1. ... We z-normalize each dataset dimension, and tune bandwidth based on Scott s rule (Scott, 2015). We exclude a small percent of queries whose density is below τ = 10 4.