Dimensionality Reduction for Tukey Regression
Authors: Kenneth Clarkson, Ruosong Wang, David Woodruff
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We give the first dimensionality reduction methods for the overconstrained Tukey regression problem... Our methods reduce a given Tukey regression problem to a smaller weighted version... Our reductions are fast, simple and easy to implement, and we give empirical results demonstrating their practicality, using existing heuristic solvers for the small versions. We also give exponentialtime algorithms giving provably good solutions, and hardness results suggesting that a significant speedup in the worst case is unlikely. |
| Researcher Affiliation | Collaboration | 1IBM Research Almaden, San Jose, California, USA 2Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. |
| Pseudocode | Yes | Figure 2. Polynomial time algorithm for finding heavy coordinates. Figure 3. Input-sparsity time algorithm for finding heavy coordinates. |
| Open Source Code | No | The paper mentions using and modifying the 'Linv Py software (lin)' and provides a link to it, but this is a third-party tool used by the authors, not the open-source code for their own proposed dimensionality reduction methodology. |
| Open Datasets | Yes | The remaining datasets are chosen from the UCI Machine Learning Repository. |
| Dataset Splits | No | The paper does not specify exact training, validation, or test split percentages or sample counts for the datasets used. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU/CPU models, memory, or cloud instance types used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Linv Py software (lin)' but does not provide a specific version number for it or any other software used. |
| Experiment Setup | Yes | For all datasets, the number of data points is n = 10000. We varied the size of the sketch from 2d to 10d (d is the dimension of the dataset)... For each dataset, we also randomly select 5% of the entries of the b vector and change them to 104, to model outliers. We repeated each experiment ten times and took the best result among all repetitions. |