Computing Approximate $\ell_p$ Sensitivities

Authors: Swati Padmanabhan, David Woodruff, Richard Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We consolidate our theoretical contributions with a demonstration of the empirical advantage of our approximation algorithms over the naive approach of estimating these sensitivities using the UCI Machine Learning Dataset Repository [46] in Section 4. We found that many datasets have extremely small total sensitivities; therefore, fast sensitivity approximation algorithms utilize this small intrinsic dimensionality of real-world data far better than Lewis weights sampling, with sampling bounds often a factor of 2-5x better. We also found these sensitivities to be easy to approximate, with the accuracy-runtime tradeoff much better than our theory suggests, with up to 40x faster in runtime. Lastly, we show that our algorithm for estimating the total sensitivity produces accurate estimates, up to small constant factors, with a runtime speedup of at least 4x.
Researcher Affiliation Collaboration Swati Padmanabhan Massachusetts Institute of Technology pswt@mit.edu David P. Woodruff Carnegie Mellon University dwoodruf@cs.cmu.edu Qiuyi (Richard) Zhang Google Research qiuyiz@google.com
Pseudocode Yes Algorithm 1 Approximating ℓ1-Sensitivities: Row-wise Approximation; Algorithm 2 Approximating the Sum of ℓp-Sensitivities for p 1; Algorithm 3 Approximating the Maximum of ℓ1-Sensitivities; Algorithm 4 Approximating the Sum of ℓ1-Sensitivities; Algorithm 5 Subroutine for Approximating the Sum of ℓ1-Sensitivities; Algorithm 6 Approximating ℓp-Sensitivities: Row-wise Approximation; Algorithm 7 Approximating the Maximum of ℓp-Sensitivities
Open Source Code No The paper mentions using the UCI Machine Learning Dataset Repository [46] but does not provide any statement or link for the source code of their own methods.
Open Datasets Yes We demonstrate our fast sensitivity approximations on multiple real-world datasets in the UCI Machine Learning Dataset Repository [46], such as the wine and fires datasets, for different p and varying approximation parameters α. We focus on the wine dataset here, with full details in Appendix E.
Dataset Splits No The paper uses datasets from the UCI Machine Learning Dataset Repository but does not explicitly mention or describe specific training, validation, or test splits for these datasets within the context of their experiments.
Hardware Specification No The paper states "We demonstrate our fast sensitivity approximations on multiple real-world datasets..." but does not provide any details about the specific hardware (CPU, GPU, memory, etc.) used for these experiments.
Software Dependencies No The paper describes algorithms and experiments but does not provide specific version numbers for software dependencies or libraries used in the implementation (e.g., "Python 3.x", "NumPy 1.x").
Experiment Setup No Section 4, titled "Numerical experiments", describes the datasets and metrics used, but does not provide specific experimental setup details such as hyperparameters (e.g., learning rates, batch sizes, epochs), model initialization, or specific optimizer settings.