reproducibilityindex.ai

In-Database Regression in Input Sparsity Time

Authors: Rajesh Jayaram, Alireza Samadian, David Woodruff, Peng Ye

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we apply our method to real datasets and show that it is significantly faster than existing algorithms. We study the performance of our sketching method on several real datasets, both for two-table joins and general joins.
Researcher Affiliation	Academia	1Computer Science Department, Carnegie Mellon University, Pittsburgh PA, United States. 2Department of Computer Science, University of Pittsburgh, Pittsburgh PA, United States. 3Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
Pseudocode	Yes	Algorithm 1 Subspace embedding for join J = T1 T2. Algorithm 2 Pre-processing step for fast ℓ2-sampling of rows of J Y, given input J = T1 T2 and Y Rd r. Algorithm 3 Sampling step for fast ℓ2-sampling of rows of J Y.
Open Source Code	Yes	Code available at https://github.com/ Anonymous Fireman/ICML_code
Open Datasets	Yes	We consider two datasets: Last FM (Cantador et al., 2011) and Movie Lens (Harper & Konstan, 2015).
Dataset Splits	No	We split the dataset into a training set and a validation set, run the regression on the training set and measure the MSE (mean squared error) on the validation set. However, no specific percentages, sample counts, or detailed splitting methodology (beyond
Hardware Specification	Yes	The implementation is written in MATLAB and run on an Intel Core i7-7500U CPU with 8GB of memory. For general joins... we run it on an Nvidia GTX1080Ti GPU.
Software Dependencies	No	The paper mentions 'MATLAB' and 'Taichi' but does not provide specific version numbers for these software components.
Experiment Setup	No	The paper mentions 'varying lambda' and 'adjust the target dimension', but it does not provide concrete hyperparameter values (e.g., learning rate, batch size, epochs) or detailed training configurations typically found in an experimental setup description.