reproducibilityindex.ai

Dimensionality Reduction for Tukey Regression

Authors: Kenneth Clarkson, Ruosong Wang, David Woodruff

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We give the ﬁrst dimensionality reduction methods for the overconstrained Tukey regression problem... Our methods reduce a given Tukey regression problem to a smaller weighted version... Our reductions are fast, simple and easy to implement, and we give empirical results demonstrating their practicality, using existing heuristic solvers for the small versions. We also give exponentialtime algorithms giving provably good solutions, and hardness results suggesting that a signiﬁcant speedup in the worst case is unlikely.
Researcher Affiliation	Collaboration	1IBM Research Almaden, San Jose, California, USA 2Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
Pseudocode	Yes	Figure 2. Polynomial time algorithm for ﬁnding heavy coordinates. Figure 3. Input-sparsity time algorithm for ﬁnding heavy coordinates.
Open Source Code	No	The paper mentions using and modifying the 'Linv Py software (lin)' and provides a link to it, but this is a third-party tool used by the authors, not the open-source code for their own proposed dimensionality reduction methodology.
Open Datasets	Yes	The remaining datasets are chosen from the UCI Machine Learning Repository.
Dataset Splits	No	The paper does not specify exact training, validation, or test split percentages or sample counts for the datasets used.
Hardware Specification	No	The paper does not specify any hardware details like GPU/CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies	No	The paper mentions using 'Linv Py software (lin)' but does not provide a specific version number for it or any other software used.
Experiment Setup	Yes	For all datasets, the number of data points is n = 10000. We varied the size of the sketch from 2d to 10d (d is the dimension of the dataset)... For each dataset, we also randomly select 5% of the entries of the b vector and change them to 104, to model outliers. We repeated each experiment ten times and took the best result among all repetitions.