Towards a Zero-One Law for Column Subset Selection

Authors: Zhao Song, David Woodruff, Peilin Zhong

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 3 Experiments
Researcher Affiliation Academia Zhao Song University of Washington magic.linuxkde@gmail.com David P. Woodruff Carnegie Mellon University dwoodruf@cs.cmu.edu Peilin Zhong Columbia University pz2225@columbia.edu
Pseudocode Yes Algorithm 1 Low rank approximation algorithm for general functions
Open Source Code No The paper does not provide a specific repository link or an explicit statement about the release of source code for the methodology described.
Open Datasets Yes Large sparse datasets in recommendation systems are common, such as the Bookcrossing (100K 300K with 106 observations) [11] and Yelp datasets (40K 10K with 105 observations) [12]
Dataset Splits No The paper describes the construction of a synthetic dataset but does not provide specific details on how it was split into training, validation, or test sets.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup Yes In each iteration, we choose 2k columns to fit the remaining columns, and we drop half of the columns with smallest regression cost. In each iteration, we repeat 20 times to find the best 2k columns. At the end, if there are at most 4k columns remaining, we finish our algorithm. We choose to optimize the Huber loss function, i.e., f(x) = 1/2x2 for x 1, and f(x) = |x| - 1/2 for x > 1.