reproducibilityindex.ai

Unbiased estimates for linear regression via volume sampling

Authors: Michal Derezinski, Manfred K. K. Warmuth

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We show that this is possible if the subset of columns is chosen proportional to the squared volume spanned by the rows of the chosen submatrix (ie, volume sampling). The resulting estimator is unbiased and surprisingly the covariance of the estimator also has a closed form: It equals a speciﬁc factor times X+ X+. Pseudo inverse plays an important part in solving the linear least squares problem, where we try to predict a label for each column of X. We assume labels are expensive and we are only given the labels for the small subset of columns we sample from X. Using our methods we show that the weight vector of the solution for the sub problem is an unbiased estimator of the optimal solution for the whole problem based on all column labels. We believe that these new formulas establish a fundamental connection between linear least squares and volume sampling. We use our methods to obtain an algorithm for volume sampling that is faster than state-of-the-art and for obtaining bounds for the total loss of the estimated least-squares solution on all labeled columns.
Researcher Affiliation	Academia	Michał Derezi nski Department of Computer Science University of California Santa Cruz mderezin@ucsc.edu Manfred K. Warmuth Department of Computer Science University of California Santa Cruz manfred@ucsc.edu
Pseudocode	Yes	Reverse iterative volume sampling Input: X Rd n, s {d..n} Z (XX ) 1 i {1..n} pi 1 x i Zxi S {1, .., n} while \|S\| > s Sample i pi out of S S S {i} v Zxi/ pi j S pj pj (x j v)2 Z Z + vv end return S
Open Source Code	No	The paper does not provide any link or explicit statement about the availability of source code for the methodology described.
Open Datasets	No	The paper is theoretical and does not use or refer to specific datasets, nor does it provide concrete access information for any dataset.
Dataset Splits	No	The paper is theoretical and does not describe any dataset splits for training, validation, or testing.
Hardware Specification	No	The paper describes theoretical work and does not mention any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not mention any software dependencies with specific version numbers.
Experiment Setup	No	The paper is theoretical and does not describe any experimental setup details such as hyperparameters or system-level training settings.