Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Column Selection via Adaptive Sampling
Authors: Saurabh Paul, Malik Magdon-Ismail, Petros Drineas
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches. |
| Researcher Affiliation | Collaboration | Saurabh Paul Global Risk Sciences, Paypal Inc. EMAIL Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute EMAIL Petros Drineas CS Dept., Rensselaer Polytechnic Institute EMAIL |
| Pseudocode | Yes | Algorithm 1: Adaptive Sampling Input: A Rm n; target rank k; # rounds t; columns per round c Output: C Rm tc, tc columns of A and S, the indices of those columns. 1: S = {}; E0 = A 2: for โ= 1, , t do 3: Sample indices Sโof c columns from Eโ 1 using a CSSP-algorithm. 4: S S Sโ. 5: Set C = AS and Eโ= A (CC+A)โk. 6: return C, S |
| Open Source Code | No | The paper states 'We implemented our algorithm using two relative-error column selection algorithms' but does not provide concrete access to source code or explicitly state its release. |
| Open Datasets | Yes | HGDP 22 chromosomes: SNPs human chromosome data from the HGDP database [26]. We use all 22 chromosome matrices (1043 rows; 7,334-37,493 columns) and report the average. Each matrix contains +1, 0, 1 entries, and we randomly ๏ฌlled in missing entries. Tech TC-300: 49 document-term matrices [27] (150-300 rows (documents); 10,000-40,000 columns (words)). |
| Dataset Splits | No | The paper does not provide specific dataset split information for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms ('near-optimal column selection algorithm of [18, 15]' and 'leverage-score sampling algorithm of [19]') but does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | We set the target rank k = 5 and the number of columns in each round to c = 2k. We have tried several choices for k and c and the results are qualitatively identical so we only report on one choice. For randomized algorithms, we repeat the experiments ๏ฌve times and take the average. |