Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

How to Fake Multiply by a Gaussian Matrix

Authors: Michael Kapralov, Vamsi Potluru, David Woodruff

ICML 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments. We empirically validate our results for both NMF and SVM applications. For NMF, we give an experimental evaluation by comparing with state-of-the-art algorithms such as SPA (Gillis et al., 2014), XRAY (Kumar et al., 2013), na ıve random projections (Damle and Sun, 2014) , structured Gaussian random projections (Tepper and Sapiro, 2015), and Tall-Skinny QR factorization (Benson et al., 2014) for NMF problems with applications to breast cancer, flow cytometry, climate data and movie analysis. Also, we show experimental speedups using our projection when combined with linear SVM solvers for document classification problems (Paul et al., 2014).
Researcher Affiliation Collaboration Michael Kapralov EMAIL EPFL, Lausanne, Switzerland Vamsi K. Potluru EMAIL Comcast Cable, Washington DC, USA 20005 David P. Woodruff EMAIL IBM Research, Almaden, San Jose, CA USA
Pseudocode Yes Algorithm 1 Count Gauss NMF (CG) Initialize the index sets Imax, Imin to empty.
Open Source Code Yes In all our experiments1, we set B = 5m. [...] 1https://github.com/marinkaz/nimfa
Open Datasets Yes Gene expression breast cancer dataset. We utilize the hereditary breast cancer dataset collected by Hedenfalk et al. (2001) which consists of the expression levels of 3226 genes on 22 samples from breast cancer patients. [...] Tech TC-300 Dataset. We obtained the Tech TC300 dataset which is a comprehensive directory of the web. There are 295-pairs of categories which provides a rich framework for running SVM experiments (Paul et al., 2014).
Dataset Splits Yes The results are shown over 10-fold cross validation with 4 repetitions and 3 runs over the random projection matrices.
Hardware Specification No The paper states that certain calculations “can be solved in a couple of seconds on an off-the-shelf desktop,” but this is too vague to be a specific hardware specification. No detailed hardware information (e.g., CPU, GPU models, memory) used for experiments is provided.
Software Dependencies No The paper mentions that “LIBSVM was used with a linear kernel” for SVM experiments. However, it does not specify the version number of LIBSVM or any other software dependencies.
Experiment Setup Yes In all our experiments1, we set B = 5m. [...] LIBSVM was used with a linear kernel and soft-margin parameter C set to 500 for all experiments and we set the projections to 128, 256, and 512.