reproducibilityindex.ai

Fully Understanding The Hashing Trick

Authors: Casper B. Freksen, Lior Kamma, Kasper Green Larsen

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical bounds are accompanied by empirical results that shed light on the nature of the constants in Theorem 2. Our empirical results show that in practice the constants inside the Θ-notation are signiﬁcantly tighter than the theoretical proof might suggest, and in fact feature hashing performs well for a larger scope of vectors.
Researcher Affiliation	Academia	Casper Freksen Department of Computer Science Aarhus University, Denmark cfreksen@cs.au.dk Lior Kamma Department of Computer Science Aarhus University, Denmark lior.kamma@cs.au.dk Kasper Green Larsen Department of Computer Science Aarhus University, Denmark larsen@cs.au.dk
Pseudocode	No	The paper describes methods mathematically and textually but does not contain structured pseudocode or algorithm blocks with explicit labels like "Algorithm" or "Pseudocode".
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described, nor does it include a link to a code repository or explicit statement of code release.
Open Datasets	Yes	We also ran experiments on real-world data, namely bag-of-words representations of 1500 NIPS papers with stopwords removed [DKT17b, New08].
Dataset Splits	No	The paper describes generating synthetic data and using real-world datasets to evaluate the properties of feature hashing, but it does not specify any training, validation, or test dataset splits in the context of model training and evaluation.
Hardware Specification	No	The paper does not provide specific hardware details (such as GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper describes the algorithms and methods used (e.g., "double tabulation hashing"), but does not list specific ancillary software dependencies with version numbers (e.g., Python version, library versions) needed for replication.
Experiment Setup	Yes	In the ﬁrst phase we varied the target dimension m over exponentially spaced values in the range [24, 214], and a parameter k which controls the ratio between the ℓ and the ℓ2 norm. The values of k varied over exponentially spaced values in the range [21, 213]. Then for all m and k, we generated 224 vectors x with entries in {0, 1} such that x 2 = k x, and for any given m and k the supports of the vectors were pairwise disjoint.