Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Rehashing Kernel Evaluation in High Dimensions
Authors: Paris Siminelakis, Kexin Rong, Peter Bailis, Moses Charikar, Philip Levis
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that these new tools offer up to 10 improvement in evaluation time on a range of synthetic and real-world datasets. |
| Researcher Affiliation | Academia | Paris Siminelakis * 1 Kexin Rong * 1 Peter Bailis 1 Moses Charikar 1 Philip Levis 1 1Stanford University, Stanford, California, US. Correspondence to: Paris Siminelakis <EMAIL>, Kexin Rong <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Data-dependent Diagnostic |
| Open Source Code | Yes | Source code available at: http://github.com/kexinrong/rehashing |
| Open Datasets | Yes | We repeat the above experiments on eight large real-world datasets from various domains... The sources of the datasets are: MSD (Bertin-Mahieux et al., 2011), Glo Ve (Pennington et al., 2014), SVHN (Netzer et al., 2011), TMY3 (Hendron & Engebrecht, 2010), covtype (Blackard & Dean, 1999), TIMIT (Garofolo, 1993). |
| Dataset Splits | No | No explicit description of dataset splits (e.g., specific percentages for training, validation, or test sets, or cross-validation methodology) was found. |
| Hardware Specification | No | The paper states 'all implementations are in C++ and we report results on a single core' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions 'implementations are in C++' and 'open sourced libraries for Fig Tree and ASKIT' but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | We tune each set of parameters via binary search to guarantee an average relative error of at most 0.1. ... We z-normalize each dataset dimension, and tune bandwidth based on Scott s rule (Scott, 2015). We exclude a small percent of queries whose density is below τ = 10 4. |