Coding for Random Projections
Authors: Ping Li, Michael Mitzenmacher, Anshumali Shrivastava
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we study a number of simple coding schemes, focusing on the task of similarity estimation and on an application to training linear classifiers. We demonstrate that uniform quantization outperforms the standard and influential method (Datar et al., 2004), which used a window-and-random offset scheme. Furthermore, we also develop a non-uniform 2-bit coding scheme that generally performs well in practice, as confirmed by our experiments on training linear support vector machines (SVM). Proofs and additional experiments are available at ar Xiv:1308.2218. |
| Researcher Affiliation | Academia | Ping Li PINGLI@STAT.RUTGERS.EDU Dept. of Statistics and Biostatistics, Dept. of Computer Science, Rutgers University, Piscataway, NJ 08854, USA Michael Mitzenmacher MICHAELM@EECS.HARVARD.EDU School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA Anshumali Shrivastava ANSHU@CS.CORNELL.EDU Dept. of Computer Science, Computing and Information Science, Cornell University, Ithaca, NY 14853, USA |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper mentions that 'Proofs and additional experiments are available at ar Xiv:1308.2218' and refers to a 'separate technical report (Li et al., 2014)'. These links point to research papers, not source code for the methodology described in this paper. |
| Open Datasets | Yes | We conduct experiments with random projections for training (L2-regularized) linear SVM (e.g., LIBLINEAR (Fan et al., 2008)) on three high-dimensional datasets: ARCENE, FARM, URL, which are available from the UCI repository. |
| Dataset Splits | No | The paper specifies training and testing sets ('10000 examples for training and 10000 for testing' for URL, '100 training and 100 testing examples' for ARCENE, '2059 training and 2084 testing examples' for FARM), but it does not explicitly mention a separate validation set or cross-validation setup for these datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU/GPU models, memory, or specific cloud instance types). |
| Software Dependencies | No | The paper mentions using 'LIBLINEAR (Fan et al., 2008)' for training linear SVM, but it does not specify a version number for this software or any other software dependencies, which is necessary for reproducibility. |
| Experiment Setup | Yes | Suppose we use hw,2 and w = 0.75. We can code an original projected value x into a vector of length 4 (i.e., 2-bit): x ( 0.75) [1 0 0 0], x [ 0.75 0) [0 1 0 0], x [0 0.75) [0 0 1 0], x [0.75 ) [0 0 0 1]. Figure 11 reports the accuracies for a wide range of SVM tuning parameter C values, from 10-3 to 103. Figure 13... The bottom panels of Figure 13 report the w values at which the best accuracies were attained. For hw,2, the optimum w values are close to 0.75. |