Quantized Random Projections and Non-Linear Estimation of Cosine Similarity
Authors: Ping Li, Michael Mitzenmacher, Martin Slawski
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experimental results concerning applications of the proposed approach in nearest neighbor search and linear classification. In nearest neighbor search, we focus on the high similarity regime and confirm theoretical insights into the trade-off between k and b. For linear classification, we observe empirically that intermediate values of b can yield better trade-offs than single-bit quantization. |
| Researcher Affiliation | Academia | Ping Li Rutgers University pingli@stat.rutgers.edu Michael Mitzenmacher Harvard University michaelm@eecs.harvard.edu Martin Slawski Rutgers University martin.slawski@rutgers.edu |
| Pseudocode | No | The paper describes computational steps for MLE approximation but does not present them in a structured pseudocode block or explicitly labeled 'Algorithm' section. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for its described methodology is publicly available. |
| Open Datasets | Yes | Real data. We consider the Farm Ads data set (n = 4, 143, d = 54, 877) from the UCI repository and the RCV1 data set (n = 20, 242, d = 47, 236) from the LIBSVM webpage [3]. |
| Dataset Splits | No | The paper specifies training and test sample counts for some datasets (e.g., '3,000 samples for training' for farm data, '100 training and 100 test samples' for Arcene), but it does not provide details on validation splits (e.g., percentages, counts, or a cross-validation setup) for reproducing the experiment. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory, or specific cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'LIBSVM' as a tool used but does not provide specific version numbers for it or any other software dependencies, which would be necessary for full reproducibility. |
| Experiment Setup | Yes | For SVM classification, we consider logarithmically spaced grids between 10 3 and 103 for the parameter C (cf. LIBSVM manual). |