High-Dimensional Geometric Streaming for Nearly Low Rank Data

Authors: Hossein Esfandiari, Praneeth Kacham, Vahab Mirrokni, David Woodruff, Peilin Zhong

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement our coreset construction algorithm (Algorithm 1) and show that the coreset has a low distortion both for the ℓ∞ low rank approximation problem and for width estimation. We run Algorithm 1 on a synthetic data set and a real world dataset.
Researcher Affiliation Collaboration *Equal contribution 1Google Research, USA 2Computer Science Department, Carnegie Mellon University, Pittsburgh, USA. Correspondence to: Praneeth Kacham <pkacham@google.com>.
Pseudocode Yes Algorithm 1 Minimize Distance to a Subspace. Algorithm 2 Minimize the ℓp norm of the vector of distances to a k dimensional subspace
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository.
Open Datasets Yes For the real world dataset, we consider a grayscale image (Leung, 2017) of dimensions 1836 × 3264... We repeat the same experiment on a different grayscale image (European Space Agency and NASA, 2006) of dimensions 4690 × 6000
Dataset Splits No The paper describes experiments on synthetic and real-world image datasets, but it does not specify any training, validation, or test splits, nor does it mention cross-validation. The datasets are used in their entirety as matrices.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory.
Software Dependencies No The paper does not specify any software dependencies (e.g., programming languages, libraries, or solvers) with version numbers.
Experiment Setup Yes We construct our synthetic dataset as follows: we pick n = 40,000, d = 10,000 and k = 20. We sample an n × k random matrix L and a k × d random matrix R each with i.i.d. uniform random variables drawn from {−100, . . . , 100}. We create an n × d matrix A .= L · R + G where G is a noise matrix with each entry being an i.i.d. uniform random variable drawn from {−5000, . . . , 5000}. We measure the maximum distortion defined as maxx∈rowspace(AS) ∥Ax∥∞ / ∥ASx∥∞. We solve these linear programs and find the max-distortion within the rowspace of AS.