Approximating Sparse PCA from Incomplete Data

Authors: ABHISEK KUNDU, Petros Drineas, Malik Magdon-Ismail

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our algorithms extensively on image, text, biological and financial data.
Researcher Affiliation Academia Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, kundua2@rpi.edu. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, drinep@cs.rpi.edu. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, magdon@cs.rpi.edu.
Pseudocode Yes Algorithm 1 Hybrid (ℓ1, ℓ2)-Element Sampling
Open Source Code No No explicit statement or link providing concrete access to the source code for the methodology described in this paper was found.
Open Datasets Yes Digit Data (m = 2313, n = 256): We use the [7] handwritten zip-code digit images (300 pixels/inch in 8-bit gray scale). Tech TC Data (m = 139, n = 15170): We use the Technion Repository of Text Categorization Dataset (Tech TC, see [6]) from the Open Directory Project (ODP). Gene Expression Data (m = 107, n = 22215): We use GSE10072 gene expression data for lung cancer from the NCBI Gene Expression Omnibus database.
Dataset Splits No No specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit mention of validation sets) were provided.
Hardware Specification No No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided.
Software Dependencies No The paper mentions using the 'Spasm toolbox' but does not provide specific version numbers for Spasm or any other software dependencies like Matlab.
Experiment Setup Yes We sample approximately 7% of the elements from the centered data using (ℓ1, ℓ2)-sampling, as well as uniform sampling. The performance for small r is shown in Table 1, including the running time τ. For this data, f(Gmax,r)/f(Gsp,r) 0.23 (r = 10). We sample approximately 5% of the elements from the centered data using our (ℓ1, ℓ2)-sampling, as well as uniform sampling.