reproducibilityindex.ai

Quantized Random Projections and Non-Linear Estimation of Cosine Similarity

Authors: Ping Li, Michael Mitzenmacher, Martin Slawski

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present experimental results concerning applications of the proposed approach in nearest neighbor search and linear classiﬁcation. In nearest neighbor search, we focus on the high similarity regime and conﬁrm theoretical insights into the trade-off between k and b. For linear classiﬁcation, we observe empirically that intermediate values of b can yield better trade-offs than single-bit quantization.
Researcher Affiliation	Academia	Ping Li Rutgers University pingli@stat.rutgers.edu Michael Mitzenmacher Harvard University michaelm@eecs.harvard.edu Martin Slawski Rutgers University martin.slawski@rutgers.edu
Pseudocode	No	The paper describes computational steps for MLE approximation but does not present them in a structured pseudocode block or explicitly labeled 'Algorithm' section.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for its described methodology is publicly available.
Open Datasets	Yes	Real data. We consider the Farm Ads data set (n = 4, 143, d = 54, 877) from the UCI repository and the RCV1 data set (n = 20, 242, d = 47, 236) from the LIBSVM webpage [3].
Dataset Splits	No	The paper specifies training and test sample counts for some datasets (e.g., '3,000 samples for training' for farm data, '100 training and 100 test samples' for Arcene), but it does not provide details on validation splits (e.g., percentages, counts, or a cross-validation setup) for reproducing the experiment.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory, or specific cloud instances) used to run the experiments.
Software Dependencies	No	The paper mentions 'LIBSVM' as a tool used but does not provide specific version numbers for it or any other software dependencies, which would be necessary for full reproducibility.
Experiment Setup	Yes	For SVM classiﬁcation, we consider logarithmically spaced grids between 10 3 and 103 for the parameter C (cf. LIBSVM manual).