How to Fake Multiply by a Gaussian Matrix
Authors: Michael Kapralov, Vamsi Potluru, David Woodruff
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments. We empirically validate our results for both NMF and SVM applications. For NMF, we give an experimental evaluation by comparing with state-of-the-art algorithms such as SPA (Gillis et al., 2014), XRAY (Kumar et al., 2013), na ıve random projections (Damle and Sun, 2014) , structured Gaussian random projections (Tepper and Sapiro, 2015), and Tall-Skinny QR factorization (Benson et al., 2014) for NMF problems with applications to breast cancer, flow cytometry, climate data and movie analysis. Also, we show experimental speedups using our projection when combined with linear SVM solvers for document classification problems (Paul et al., 2014). |
| Researcher Affiliation | Collaboration | Michael Kapralov MICHAEL.KAPRALOV@EPFL.CH EPFL, Lausanne, Switzerland Vamsi K. Potluru VAMSI_POTLURU@CABLE.COMCAST.COM Comcast Cable, Washington DC, USA 20005 David P. Woodruff DPWOODRU@US.IBM.COM IBM Research, Almaden, San Jose, CA USA |
| Pseudocode | Yes | Algorithm 1 Count Gauss NMF (CG) Initialize the index sets Imax, Imin to empty. |
| Open Source Code | Yes | In all our experiments1, we set B = 5m. [...] 1https://github.com/marinkaz/nimfa |
| Open Datasets | Yes | Gene expression breast cancer dataset. We utilize the hereditary breast cancer dataset collected by Hedenfalk et al. (2001) which consists of the expression levels of 3226 genes on 22 samples from breast cancer patients. [...] Tech TC-300 Dataset. We obtained the Tech TC300 dataset which is a comprehensive directory of the web. There are 295-pairs of categories which provides a rich framework for running SVM experiments (Paul et al., 2014). |
| Dataset Splits | Yes | The results are shown over 10-fold cross validation with 4 repetitions and 3 runs over the random projection matrices. |
| Hardware Specification | No | The paper states that certain calculations “can be solved in a couple of seconds on an off-the-shelf desktop,” but this is too vague to be a specific hardware specification. No detailed hardware information (e.g., CPU, GPU models, memory) used for experiments is provided. |
| Software Dependencies | No | The paper mentions that “LIBSVM was used with a linear kernel” for SVM experiments. However, it does not specify the version number of LIBSVM or any other software dependencies. |
| Experiment Setup | Yes | In all our experiments1, we set B = 5m. [...] LIBSVM was used with a linear kernel and soft-margin parameter C set to 500 for all experiments and we set the projections to 128, 256, and 512. |