High-Dimensional Geometric Streaming for Nearly Low Rank Data
Authors: Hossein Esfandiari, Praneeth Kacham, Vahab Mirrokni, David Woodruff, Peilin Zhong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our coreset construction algorithm (Algorithm 1) and show that the coreset has a low distortion both for the ℓ∞ low rank approximation problem and for width estimation. We run Algorithm 1 on a synthetic data set and a real world dataset. |
| Researcher Affiliation | Collaboration | *Equal contribution 1Google Research, USA 2Computer Science Department, Carnegie Mellon University, Pittsburgh, USA. Correspondence to: Praneeth Kacham <pkacham@google.com>. |
| Pseudocode | Yes | Algorithm 1 Minimize Distance to a Subspace. Algorithm 2 Minimize the ℓp norm of the vector of distances to a k dimensional subspace |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository. |
| Open Datasets | Yes | For the real world dataset, we consider a grayscale image (Leung, 2017) of dimensions 1836 × 3264... We repeat the same experiment on a different grayscale image (European Space Agency and NASA, 2006) of dimensions 4690 × 6000 |
| Dataset Splits | No | The paper describes experiments on synthetic and real-world image datasets, but it does not specify any training, validation, or test splits, nor does it mention cross-validation. The datasets are used in their entirety as matrices. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper does not specify any software dependencies (e.g., programming languages, libraries, or solvers) with version numbers. |
| Experiment Setup | Yes | We construct our synthetic dataset as follows: we pick n = 40,000, d = 10,000 and k = 20. We sample an n × k random matrix L and a k × d random matrix R each with i.i.d. uniform random variables drawn from {−100, . . . , 100}. We create an n × d matrix A .= L · R + G where G is a noise matrix with each entry being an i.i.d. uniform random variable drawn from {−5000, . . . , 5000}. We measure the maximum distortion defined as maxx∈rowspace(AS) ∥Ax∥∞ / ∥ASx∥∞. We solve these linear programs and find the max-distortion within the rowspace of AS. |