Quantized Random Projections and Non-Linear Estimation of Cosine Similarity

Authors: Ping Li, Michael Mitzenmacher, Martin Slawski

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experimental results concerning applications of the proposed approach in nearest neighbor search and linear classification. In nearest neighbor search, we focus on the high similarity regime and confirm theoretical insights into the trade-off between k and b. For linear classification, we observe empirically that intermediate values of b can yield better trade-offs than single-bit quantization.
Researcher Affiliation Academia Ping Li Rutgers University pingli@stat.rutgers.edu Michael Mitzenmacher Harvard University michaelm@eecs.harvard.edu Martin Slawski Rutgers University martin.slawski@rutgers.edu
Pseudocode No The paper describes computational steps for MLE approximation but does not present them in a structured pseudocode block or explicitly labeled 'Algorithm' section.
Open Source Code No The paper does not provide any statement or link indicating that the source code for its described methodology is publicly available.
Open Datasets Yes Real data. We consider the Farm Ads data set (n = 4, 143, d = 54, 877) from the UCI repository and the RCV1 data set (n = 20, 242, d = 47, 236) from the LIBSVM webpage [3].
Dataset Splits No The paper specifies training and test sample counts for some datasets (e.g., '3,000 samples for training' for farm data, '100 training and 100 test samples' for Arcene), but it does not provide details on validation splits (e.g., percentages, counts, or a cross-validation setup) for reproducing the experiment.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory, or specific cloud instances) used to run the experiments.
Software Dependencies No The paper mentions 'LIBSVM' as a tool used but does not provide specific version numbers for it or any other software dependencies, which would be necessary for full reproducibility.
Experiment Setup Yes For SVM classification, we consider logarithmically spaced grids between 10 3 and 103 for the parameter C (cf. LIBSVM manual).