Rehashing Kernel Evaluation in High Dimensions
Authors: Paris Siminelakis, Kexin Rong, Peter Bailis, Moses Charikar, Philip Levis
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that these new tools offer up to 10 improvement in evaluation time on a range of synthetic and real-world datasets. |
| Researcher Affiliation | Academia | Paris Siminelakis * 1 Kexin Rong * 1 Peter Bailis 1 Moses Charikar 1 Philip Levis 1 1Stanford University, Stanford, California, US. Correspondence to: Paris Siminelakis <psimin@stanford.edu>, Kexin Rong <krong@stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 Data-dependent Diagnostic |
| Open Source Code | Yes | Source code available at: http://github.com/kexinrong/rehashing |
| Open Datasets | Yes | We repeat the above experiments on eight large real-world datasets from various domains... The sources of the datasets are: MSD (Bertin-Mahieux et al., 2011), Glo Ve (Pennington et al., 2014), SVHN (Netzer et al., 2011), TMY3 (Hendron & Engebrecht, 2010), covtype (Blackard & Dean, 1999), TIMIT (Garofolo, 1993). |
| Dataset Splits | No | No explicit description of dataset splits (e.g., specific percentages for training, validation, or test sets, or cross-validation methodology) was found. |
| Hardware Specification | No | The paper states 'all implementations are in C++ and we report results on a single core' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions 'implementations are in C++' and 'open sourced libraries for Fig Tree and ASKIT' but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We tune each set of parameters via binary search to guarantee an average relative error of at most 0.1. ... We z-normalize each dataset dimension, and tune bandwidth based on Scott s rule (Scott, 2015). We exclude a small percent of queries whose density is below τ = 10 4. |