Column Selection via Adaptive Sampling
Authors: Saurabh Paul, Malik Magdon-Ismail, Petros Drineas
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches. |
| Researcher Affiliation | Collaboration | Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas CS Dept., Rensselaer Polytechnic Institute drinep@cs.rpi.edu |
| Pseudocode | Yes | Algorithm 1: Adaptive Sampling Input: A Rm n; target rank k; # rounds t; columns per round c Output: C Rm tc, tc columns of A and S, the indices of those columns. 1: S = {}; E0 = A 2: for ℓ= 1, , t do 3: Sample indices Sℓof c columns from Eℓ 1 using a CSSP-algorithm. 4: S S Sℓ. 5: Set C = AS and Eℓ= A (CC+A)ℓk. 6: return C, S |
| Open Source Code | No | The paper states 'We implemented our algorithm using two relative-error column selection algorithms' but does not provide concrete access to source code or explicitly state its release. |
| Open Datasets | Yes | HGDP 22 chromosomes: SNPs human chromosome data from the HGDP database [26]. We use all 22 chromosome matrices (1043 rows; 7,334-37,493 columns) and report the average. Each matrix contains +1, 0, 1 entries, and we randomly filled in missing entries. Tech TC-300: 49 document-term matrices [27] (150-300 rows (documents); 10,000-40,000 columns (words)). |
| Dataset Splits | No | The paper does not provide specific dataset split information for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms ('near-optimal column selection algorithm of [18, 15]' and 'leverage-score sampling algorithm of [19]') but does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | We set the target rank k = 5 and the number of columns in each round to c = 2k. We have tried several choices for k and c and the results are qualitatively identical so we only report on one choice. For randomized algorithms, we repeat the experiments five times and take the average. |