In-Database Regression in Input Sparsity Time

Authors: Rajesh Jayaram, Alireza Samadian, David Woodruff, Peng Ye

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we apply our method to real datasets and show that it is significantly faster than existing algorithms. We study the performance of our sketching method on several real datasets, both for two-table joins and general joins.
Researcher Affiliation Academia 1Computer Science Department, Carnegie Mellon University, Pittsburgh PA, United States. 2Department of Computer Science, University of Pittsburgh, Pittsburgh PA, United States. 3Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
Pseudocode Yes Algorithm 1 Subspace embedding for join J = T1 T2. Algorithm 2 Pre-processing step for fast ℓ2-sampling of rows of J Y, given input J = T1 T2 and Y Rd r. Algorithm 3 Sampling step for fast ℓ2-sampling of rows of J Y.
Open Source Code Yes Code available at https://github.com/ Anonymous Fireman/ICML_code
Open Datasets Yes We consider two datasets: Last FM (Cantador et al., 2011) and Movie Lens (Harper & Konstan, 2015).
Dataset Splits No We split the dataset into a training set and a validation set, run the regression on the training set and measure the MSE (mean squared error) on the validation set. However, no specific percentages, sample counts, or detailed splitting methodology (beyond
Hardware Specification Yes The implementation is written in MATLAB and run on an Intel Core i7-7500U CPU with 8GB of memory. For general joins... we run it on an Nvidia GTX1080Ti GPU.
Software Dependencies No The paper mentions 'MATLAB' and 'Taichi' but does not provide specific version numbers for these software components.
Experiment Setup No The paper mentions 'varying lambda' and 'adjust the target dimension', but it does not provide concrete hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations typically found in an experimental setup description.