reproducibilityindex.ai

Column Selection via Adaptive Sampling

Authors: Saurabh Paul, Malik Magdon-Ismail, Petros Drineas

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches.
Researcher Affiliation	Collaboration	Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas CS Dept., Rensselaer Polytechnic Institute drinep@cs.rpi.edu
Pseudocode	Yes	Algorithm 1: Adaptive Sampling Input: A Rm n; target rank k; # rounds t; columns per round c Output: C Rm tc, tc columns of A and S, the indices of those columns. 1: S = {}; E0 = A 2: for ℓ= 1, , t do 3: Sample indices Sℓof c columns from Eℓ 1 using a CSSP-algorithm. 4: S S Sℓ. 5: Set C = AS and Eℓ= A (CC+A)ℓk. 6: return C, S
Open Source Code	No	The paper states 'We implemented our algorithm using two relative-error column selection algorithms' but does not provide concrete access to source code or explicitly state its release.
Open Datasets	Yes	HGDP 22 chromosomes: SNPs human chromosome data from the HGDP database [26]. We use all 22 chromosome matrices (1043 rows; 7,334-37,493 columns) and report the average. Each matrix contains +1, 0, 1 entries, and we randomly ﬁlled in missing entries. Tech TC-300: 49 document-term matrices [27] (150-300 rows (documents); 10,000-40,000 columns (words)).
Dataset Splits	No	The paper does not provide specific dataset split information for validation.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper mentions using specific algorithms ('near-optimal column selection algorithm of [18, 15]' and 'leverage-score sampling algorithm of [19]') but does not list specific software dependencies with version numbers.
Experiment Setup	Yes	We set the target rank k = 5 and the number of columns in each round to c = 2k. We have tried several choices for k and c and the results are qualitatively identical so we only report on one choice. For randomized algorithms, we repeat the experiments ﬁve times and take the average.