Fully Understanding The Hashing Trick

Authors: Casper B. Freksen, Lior Kamma, Kasper Green Larsen

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical bounds are accompanied by empirical results that shed light on the nature of the constants in Theorem 2. Our empirical results show that in practice the constants inside the Θ-notation are significantly tighter than the theoretical proof might suggest, and in fact feature hashing performs well for a larger scope of vectors.
Researcher Affiliation Academia Casper Freksen Department of Computer Science Aarhus University, Denmark cfreksen@cs.au.dk Lior Kamma Department of Computer Science Aarhus University, Denmark lior.kamma@cs.au.dk Kasper Green Larsen Department of Computer Science Aarhus University, Denmark larsen@cs.au.dk
Pseudocode No The paper describes methods mathematically and textually but does not contain structured pseudocode or algorithm blocks with explicit labels like "Algorithm" or "Pseudocode".
Open Source Code No The paper does not provide concrete access to source code for the methodology described, nor does it include a link to a code repository or explicit statement of code release.
Open Datasets Yes We also ran experiments on real-world data, namely bag-of-words representations of 1500 NIPS papers with stopwords removed [DKT17b, New08].
Dataset Splits No The paper describes generating synthetic data and using real-world datasets to evaluate the properties of feature hashing, but it does not specify any training, validation, or test dataset splits in the context of model training and evaluation.
Hardware Specification No The paper does not provide specific hardware details (such as GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper describes the algorithms and methods used (e.g., "double tabulation hashing"), but does not list specific ancillary software dependencies with version numbers (e.g., Python version, library versions) needed for replication.
Experiment Setup Yes In the first phase we varied the target dimension m over exponentially spaced values in the range [24, 214], and a parameter k which controls the ratio between the ℓ and the ℓ2 norm. The values of k varied over exponentially spaced values in the range [21, 213]. Then for all m and k, we generated 224 vectors x with entries in {0, 1} such that x 2 = k x, and for any given m and k the supports of the vectors were pairwise disjoint.