Column Selection via Adaptive Sampling

Authors: Saurabh Paul, Malik Magdon-Ismail, Petros Drineas

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches.
Researcher Affiliation Collaboration Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas CS Dept., Rensselaer Polytechnic Institute drinep@cs.rpi.edu
Pseudocode Yes Algorithm 1: Adaptive Sampling Input: A Rm n; target rank k; # rounds t; columns per round c Output: C Rm tc, tc columns of A and S, the indices of those columns. 1: S = {}; E0 = A 2: for ℓ= 1, , t do 3: Sample indices Sℓof c columns from Eℓ 1 using a CSSP-algorithm. 4: S S Sℓ. 5: Set C = AS and Eℓ= A (CC+A)ℓk. 6: return C, S
Open Source Code No The paper states 'We implemented our algorithm using two relative-error column selection algorithms' but does not provide concrete access to source code or explicitly state its release.
Open Datasets Yes HGDP 22 chromosomes: SNPs human chromosome data from the HGDP database [26]. We use all 22 chromosome matrices (1043 rows; 7,334-37,493 columns) and report the average. Each matrix contains +1, 0, 1 entries, and we randomly filled in missing entries. Tech TC-300: 49 document-term matrices [27] (150-300 rows (documents); 10,000-40,000 columns (words)).
Dataset Splits No The paper does not provide specific dataset split information for validation.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper mentions using specific algorithms ('near-optimal column selection algorithm of [18, 15]' and 'leverage-score sampling algorithm of [19]') but does not list specific software dependencies with version numbers.
Experiment Setup Yes We set the target rank k = 5 and the number of columns in each round to c = 2k. We have tried several choices for k and c and the results are qualitatively identical so we only report on one choice. For randomized algorithms, we repeat the experiments five times and take the average.