reproducibilityindex.ai

Turnstile $\ell_p$ leverage score sampling with applications

Authors: Alexander Munteanu, Simon Omlor

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare experimentally to plain oblivious sketching and plain leverage score sampling algorithms for ℓp and logistic regression.
Researcher Affiliation	Academia	1Dortmund Data Science Center, Faculties of Statistics and Computer Science, TU Dortmund University, Dortmund, Germany 2Faculty of Statistics, TU Dortmund University, Dortmund, Germany 3Lamarr-Institute for Machine Learning and Artificial Intelligence, Dortmund, Germany. Correspondence to: Alexander Munteanu <alexander.munteanu@tudortmund.de>, Simon Omlor <simon.omlor@tu-dortmund.de>.
Pseudocode	Yes	Algorithm 1 Finding ℓp heavy hitters.
Open Source Code	Yes	Our new and combined code is available at https://github.com/Tim907/turnstile-sampling.
Open Datasets	Yes	The following real-world datasets have become standard baselines to measure the performance of data reduction algorithms for logistic regression and ℓ1 regression: Covertype, Webspam, and KDDCup, see Appendix I.2 for details.
Dataset Splits	No	For each dataset, and each of the two problems, we first solve the original large instance to optimality to obtain zopt. We then run the data reduction algorithms, for varying target coreset resp. sketch sizes, and solve the reduced and reweighted problem to optimality to obtain the approximation z. For each target size, we repeat this process 21 times and plot in Figure 1 the median of the resulting approximation ratios f( z)/f(zopt).
Hardware Specification	Yes	All experiments were run on a workstation with AMD Ryzen Threadripper PRO 5975WX, 32 cores at 3.6GHz, 512GB DDR4-3200.
Software Dependencies	No	We experienced convergence problems using the scipy optimizer for the non-differentiable ℓ1 loss.
Experiment Setup	No	For each dataset, and each of the two problems, we first solve the original large instance to optimality to obtain zopt. We then run the data reduction algorithms, for varying target coreset resp. sketch sizes, and solve the reduced and reweighted problem to optimality to obtain the approximation z. For each target size, we repeat this process 21 times and plot in Figure 1 the median of the resulting approximation ratios f( z)/f(zopt).