Towards a Zero-One Law for Column Subset Selection
Authors: Zhao Song, David Woodruff, Peilin Zhong
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 Experiments |
| Researcher Affiliation | Academia | Zhao Song University of Washington magic.linuxkde@gmail.com David P. Woodruff Carnegie Mellon University dwoodruf@cs.cmu.edu Peilin Zhong Columbia University pz2225@columbia.edu |
| Pseudocode | Yes | Algorithm 1 Low rank approximation algorithm for general functions |
| Open Source Code | No | The paper does not provide a specific repository link or an explicit statement about the release of source code for the methodology described. |
| Open Datasets | Yes | Large sparse datasets in recommendation systems are common, such as the Bookcrossing (100K 300K with 106 observations) [11] and Yelp datasets (40K 10K with 105 observations) [12] |
| Dataset Splits | No | The paper describes the construction of a synthetic dataset but does not provide specific details on how it was split into training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | In each iteration, we choose 2k columns to fit the remaining columns, and we drop half of the columns with smallest regression cost. In each iteration, we repeat 20 times to find the best 2k columns. At the end, if there are at most 4k columns remaining, we finish our algorithm. We choose to optimize the Huber loss function, i.e., f(x) = 1/2x2 for x 1, and f(x) = |x| - 1/2 for x > 1. |