Understanding Sparse JL for Feature Hashing

Authors: Meena Jagadeesan

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our result theoretically demonstrates that sparse JL with s > 1 can have significantly better norm-preservation properties on feature vectors than sparse JL with s = 1; we also empirically demonstrate this finding. We also empirically support our theoretical findings in Theorem 1.5. First, we illustrate with realworld datasets the potential benefits of using small constants s > 1 for sparse JL on feature vectors. We specifically show that s = {4, 8, 16} consistently outperforms s = 1 in preserving the ℓ2 norm of each vector, and that there can be up to a factor of ten decrease in failure probability for s = 8, 16 in comparison to s = 1. Second, we use synthetic data to illustrate phase transitions and other trends in Theorem 1.5.
Researcher Affiliation Academia Meena Jagadeesan Harvard University Cambridge, MA 02138 mjagadeesan@college.harvard.edu
Pseudocode No No pseudocode or algorithm blocks are present.
Open Source Code No The paper mentions that a variant of sparse JL is included in the Python sklearn library (footnote 3: https://scikit-learn.org/stable/modules/random_projection.html), but does not provide specific access to their own source code for the methodology described in the paper.
Open Datasets Yes We considered two bag-of-words datasets: the News20 dataset [1] (based on newsgroup documents), and the Enron email dataset [26] (based on e-mails from the senior management of Enron).
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits, only stating that the failure probability was estimated 'for each dataset'.
Hardware Specification No No specific hardware details (such as GPU/CPU models or memory) used for running the experiments are mentioned.
Software Dependencies No The paper mentions the use of the 'Python sklearn library' but does not specify version numbers for sklearn or Python, which is necessary for reproducibility.
Experiment Setup Yes We consider s {1, 2, 4, 8, 16}, and choose m values so that 0.01 ˆδ(1, m, ϵ) 0.04. We computed each ˆδ(s, m, ϵ, w) using 100,000 samples from a block sparse JL distribution. Our synthetic data consisted of binary vectors (i.e. vectors whose entries are in {0, 1}).