Space and Time Efficient Kernel Density Estimation in High Dimensions
Authors: Arturs Backurs, Piotr Indyk, Tal Wagner
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on various datasets verify that our approach attains accuracy and query time similar to Charikar and Siminelakis (2017), with significantly improved space and preprocessing time. 4 Empirical Evaluation |
| Researcher Affiliation | Academia | Arturs Backurs TTIC backurs@ttic.edu Piotr Indyk MIT indyk@mit.edu Tal Wagner MIT talw@mit.edu |
| Pseudocode | Yes | Algorithm 1 : Space-Efficient HBE |
| Open Source Code | Yes | Our code is available at https://github.com/talwagner/efficient_kde. |
| Open Datasets | Yes | Covertype [BD99] and Census, and the latter two are Glo Ve [PSM14] and MNIST [LC98]. Their properties are summarized in Table 2. |
| Dataset Splits | No | The paper uses standard datasets (Covertype, Census, Glo Ve, MNIST) but does not provide explicit details about train/validation/test splits, percentages, or methodology for data partitioning beyond stating that 100 query points were randomly chosen. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Note that it is parameterized by the number of hash tables L, while its analysis in Theorem 1 is parameterized in terms of τ, ϵ, where we recall that L = Θ(1/( τϵ2)). For practical implementation, parameterizing by L is more natural since it acts as a smooth handle on the resources to accuracy tradeoff larger L yields better KDE estimates at the expense of using more time and space... For each dataset we choose two bandwidth settings, one which yields median KDE values of order 10 2, and the other of order 10 3. The bandwidth values and their precise method of choice are specified in the appendix. |