reproducibilityindex.ai

Coding for Random Projections

Authors: Ping Li, Michael Mitzenmacher, Anshumali Shrivastava

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we study a number of simple coding schemes, focusing on the task of similarity estimation and on an application to training linear classiﬁers. We demonstrate that uniform quantization outperforms the standard and inﬂuential method (Datar et al., 2004), which used a window-and-random offset scheme. Furthermore, we also develop a non-uniform 2-bit coding scheme that generally performs well in practice, as conﬁrmed by our experiments on training linear support vector machines (SVM). Proofs and additional experiments are available at ar Xiv:1308.2218.
Researcher Affiliation	Academia	Ping Li PINGLI@STAT.RUTGERS.EDU Dept. of Statistics and Biostatistics, Dept. of Computer Science, Rutgers University, Piscataway, NJ 08854, USA Michael Mitzenmacher MICHAELM@EECS.HARVARD.EDU School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA Anshumali Shrivastava ANSHU@CS.CORNELL.EDU Dept. of Computer Science, Computing and Information Science, Cornell University, Ithaca, NY 14853, USA
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper mentions that 'Proofs and additional experiments are available at ar Xiv:1308.2218' and refers to a 'separate technical report (Li et al., 2014)'. These links point to research papers, not source code for the methodology described in this paper.
Open Datasets	Yes	We conduct experiments with random projections for training (L2-regularized) linear SVM (e.g., LIBLINEAR (Fan et al., 2008)) on three high-dimensional datasets: ARCENE, FARM, URL, which are available from the UCI repository.
Dataset Splits	No	The paper specifies training and testing sets ('10000 examples for training and 10000 for testing' for URL, '100 training and 100 testing examples' for ARCENE, '2059 training and 2084 testing examples' for FARM), but it does not explicitly mention a separate validation set or cross-validation setup for these datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU/GPU models, memory, or specific cloud instance types).
Software Dependencies	No	The paper mentions using 'LIBLINEAR (Fan et al., 2008)' for training linear SVM, but it does not specify a version number for this software or any other software dependencies, which is necessary for reproducibility.
Experiment Setup	Yes	Suppose we use hw,2 and w = 0.75. We can code an original projected value x into a vector of length 4 (i.e., 2-bit): x ( 0.75) [1 0 0 0], x [ 0.75 0) [0 1 0 0], x [0 0.75) [0 0 1 0], x [0.75 ) [0 0 0 1]. Figure 11 reports the accuracies for a wide range of SVM tuning parameter C values, from 10-3 to 103. Figure 13... The bottom panels of Figure 13 report the w values at which the best accuracies were attained. For hw,2, the optimum w values are close to 0.75.