reproducibilityindex.ai

Towards a Zero-One Law for Column Subset Selection

Authors: Zhao Song, David Woodruff, Peilin Zhong

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3 Experiments
Researcher Affiliation	Academia	Zhao Song University of Washington magic.linuxkde@gmail.com David P. Woodruff Carnegie Mellon University dwoodruf@cs.cmu.edu Peilin Zhong Columbia University pz2225@columbia.edu
Pseudocode	Yes	Algorithm 1 Low rank approximation algorithm for general functions
Open Source Code	No	The paper does not provide a specific repository link or an explicit statement about the release of source code for the methodology described.
Open Datasets	Yes	Large sparse datasets in recommendation systems are common, such as the Bookcrossing (100K 300K with 106 observations) [11] and Yelp datasets (40K 10K with 105 observations) [12]
Dataset Splits	No	The paper describes the construction of a synthetic dataset but does not provide specific details on how it was split into training, validation, or test sets.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup	Yes	In each iteration, we choose 2k columns to fit the remaining columns, and we drop half of the columns with smallest regression cost. In each iteration, we repeat 20 times to find the best 2k columns. At the end, if there are at most 4k columns remaining, we finish our algorithm. We choose to optimize the Huber loss function, i.e., f(x) = 1/2x2 for x 1, and f(x) = \|x\| - 1/2 for x > 1.