reproducibilityindex.ai

Rehashing Kernel Evaluation in High Dimensions

Authors: Paris Siminelakis, Kexin Rong, Peter Bailis, Moses Charikar, Philip Levis

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that these new tools offer up to 10 improvement in evaluation time on a range of synthetic and real-world datasets.
Researcher Affiliation	Academia	Paris Siminelakis * 1 Kexin Rong * 1 Peter Bailis 1 Moses Charikar 1 Philip Levis 1 1Stanford University, Stanford, California, US. Correspondence to: Paris Siminelakis <psimin@stanford.edu>, Kexin Rong <krong@stanford.edu>.
Pseudocode	Yes	Algorithm 1 Data-dependent Diagnostic
Open Source Code	Yes	Source code available at: http://github.com/kexinrong/rehashing
Open Datasets	Yes	We repeat the above experiments on eight large real-world datasets from various domains... The sources of the datasets are: MSD (Bertin-Mahieux et al., 2011), Glo Ve (Pennington et al., 2014), SVHN (Netzer et al., 2011), TMY3 (Hendron & Engebrecht, 2010), covtype (Blackard & Dean, 1999), TIMIT (Garofolo, 1993).
Dataset Splits	No	No explicit description of dataset splits (e.g., specific percentages for training, validation, or test sets, or cross-validation methodology) was found.
Hardware Specification	No	The paper states 'all implementations are in C++ and we report results on a single core' but does not provide specific hardware details such as CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions 'implementations are in C++' and 'open sourced libraries for Fig Tree and ASKIT' but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	We tune each set of parameters via binary search to guarantee an average relative error of at most 0.1. ... We z-normalize each dataset dimension, and tune bandwidth based on Scott s rule (Scott, 2015). We exclude a small percent of queries whose density is below τ = 10 4.