Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Approximating Sparse PCA from Incomplete Data
Authors: ABHISEK KUNDU, Petros Drineas, Malik Magdon-Ismail
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our algorithms extensively on image, text, biological and ο¬nancial data. |
| Researcher Affiliation | Academia | Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, EMAIL. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, EMAIL. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, EMAIL. |
| Pseudocode | Yes | Algorithm 1 Hybrid (β1, β2)-Element Sampling |
| Open Source Code | No | No explicit statement or link providing concrete access to the source code for the methodology described in this paper was found. |
| Open Datasets | Yes | Digit Data (m = 2313, n = 256): We use the [7] handwritten zip-code digit images (300 pixels/inch in 8-bit gray scale). Tech TC Data (m = 139, n = 15170): We use the Technion Repository of Text Categorization Dataset (Tech TC, see [6]) from the Open Directory Project (ODP). Gene Expression Data (m = 107, n = 22215): We use GSE10072 gene expression data for lung cancer from the NCBI Gene Expression Omnibus database. |
| Dataset Splits | No | No specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or explicit mention of validation sets) were provided. |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running the experiments were provided. |
| Software Dependencies | No | The paper mentions using the 'Spasm toolbox' but does not provide specific version numbers for Spasm or any other software dependencies like Matlab. |
| Experiment Setup | Yes | We sample approximately 7% of the elements from the centered data using (β1, β2)-sampling, as well as uniform sampling. The performance for small r is shown in Table 1, including the running time Ο. For this data, f(Gmax,r)/f(Gsp,r) 0.23 (r = 10). We sample approximately 5% of the elements from the centered data using our (β1, β2)-sampling, as well as uniform sampling. |