Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Column Selection via Adaptive Sampling

Authors: Saurabh Paul, Malik Magdon-Ismail, Petros Drineas

NeurIPS 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches.
Researcher Affiliation Collaboration Saurabh Paul Global Risk Sciences, Paypal Inc. EMAIL Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute EMAIL Petros Drineas CS Dept., Rensselaer Polytechnic Institute EMAIL
Pseudocode Yes Algorithm 1: Adaptive Sampling Input: A Rm n; target rank k; # rounds t; columns per round c Output: C Rm tc, tc columns of A and S, the indices of those columns. 1: S = {}; E0 = A 2: for โ„“= 1, , t do 3: Sample indices Sโ„“of c columns from Eโ„“ 1 using a CSSP-algorithm. 4: S S Sโ„“. 5: Set C = AS and Eโ„“= A (CC+A)โ„“k. 6: return C, S
Open Source Code No The paper states 'We implemented our algorithm using two relative-error column selection algorithms' but does not provide concrete access to source code or explicitly state its release.
Open Datasets Yes HGDP 22 chromosomes: SNPs human chromosome data from the HGDP database [26]. We use all 22 chromosome matrices (1043 rows; 7,334-37,493 columns) and report the average. Each matrix contains +1, 0, 1 entries, and we randomly ๏ฌlled in missing entries. Tech TC-300: 49 document-term matrices [27] (150-300 rows (documents); 10,000-40,000 columns (words)).
Dataset Splits No The paper does not provide specific dataset split information for validation.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions using specific algorithms ('near-optimal column selection algorithm of [18, 15]' and 'leverage-score sampling algorithm of [19]') but does not list specific software dependencies with version numbers.
Experiment Setup Yes We set the target rank k = 5 and the number of columns in each round to c = 2k. We have tried several choices for k and c and the results are qualitatively identical so we only report on one choice. For randomized algorithms, we repeat the experiments ๏ฌve times and take the average.