Distributed Least Squares in Small Space via Sketching and Bias Reduction

Authors: Sachin Garg, Kevin Tan, Michal Derezinski

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 2, on the X-axis we plot the number q of estimators being averaged, so that the bias of a single estimator appears on the right-hand side of the plot (large q), whereas the variance (error) appears on the left-hand side (q = 1). In each plot, the line with nnz per row = 1 (and large sketch size) corresponds to uniform subsampling, whereas the remaining ones are sketches produced by compressing that subsample. The plot shows that decreasing the sketch size (i.e., compressing the sample) does increase the error of a single estimator (as expected), however it also shows that the bias of these estimators remains essentially unchanged regardless of the sketch size (since all lines meet as q ), confirming that suitable sparse sketches that compress rows of data can preserve near-unbiasedness without increasing the cost.
Researcher Affiliation Academia Sachin Garg Computer Science & Engineering University of Michigan sachg@umich.edu Kevin Tan Department of Statistics University of Pennsylvania kevtan@umich.edu MichaƂ Derezi nski Computer Science & Engineering University of Michigan derezin@umich.edu
Pseudocode No The paper describes the algorithms and methods in prose, but it does not include formal pseudocode blocks or algorithm figures.
Open Source Code Yes A supplementary zipfile containing the code is provided to reproduce the experimental results.
Open Datasets Yes We examine three benchmark regression datasets, Abalone (4177 rows, 8 features), Boston (506 rows, 13 features), and Year Prediction MSD (truncated to the first 2500 rows, with 90 features), from the libsvm repository from [8]. ... One of the experiments requires the file Year Prediction MSD.txt file, which can be obtained at (https://archive.ics.uci.edu/dataset/ 203/yearpredictionmsd).
Dataset Splits No The paper does not specify exact training, validation, or test dataset splits. It describes experimental parameters like sketch size and number of machines but not data partitioning ratios.
Hardware Specification Yes All the experiments were run on an i9-13900k processor with 128GB of RAM and an NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions the use of certain datasets and refers to previous work, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, NumPy versions).
Experiment Setup Yes Within each dataset, we perform four simulations, designed so that the total sketching cost stays the same for all four test cases, by simultaneously changing sketch size and sparsity. Concretely, we vary these so that the product (sketch size nnz per row) stays the same in each case, so as to ensure that the total cost of sketching is fixed in each plot. In Figure 2, on the X-axis we plot the number q of estimators being averaged