Near-Optimal Linear Regression under Distribution Shift

Authors: Qi Lei, Wei Hu, Jason Lee

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments Our estimators are provably near optimal for the worst case β. However, it remains unknown whether on average they outperform other baselines. With synthetic data we explore the performances with random β. We are also interested to investigate the conditions when we win more.
Researcher Affiliation Academia 1Princeton University. Correspondence to: Qi Lei <qilei@princeton.edu>, Wei Hu <huwei@cs.princeton.edu>, Jason D. Lee <jasonlee@princeton.edu>.
Pseudocode Yes Our meta-algorithm. Our paper considers different settings with distribution shift. Our methods are unified under the following meta-algorithm: Step 1: Construct an unbiased sufficient statistic ˆβSS3 for the unknown parameter. Step 2: Construct ˆβMM, a linear function of the sufficient statistic ˆβSS that minimizes LB( ˆβMM).
Open Source Code No The paper does not contain any statement about making its source code publicly available or a link to a code repository.
Open Datasets Yes Experiments on Berkeley Yearbook Dataset To verify the performance of our algorithm on real-world data, we conduct an experiment on the Berkeley Yearbook dataset (Ginosar et al., 2015).
Dataset Splits Yes We also have an additional 200 labeled data from target domain as validation set only for hyper-parameter tuning.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as programming languages, libraries, or solver names with version numbers, used to replicate the experiment.
Experiment Setup Yes We set n S = 2000, d = 50, σ = 1, r = √d. For each setting, we sample β T from standard normal distribution and rescale it to be norm r. We estimate ΣT by n U = 2000 unlabeled samples. ... We choose λi(ΣS) i, λi(ΣT ) 1/i, and the eigenspace for both ΣS and ΣT are random orthonormal matrices. ( ΣS 2 F = ΣT 2 F = d.) The ground truth model is a one-hidden-layer Re LU network: f (x) = 1/da (Wx)+, where W and a are randomly generated from standard Gaussian distribution. We observe noisy labels: y S = f (x) + z, where zi N(0, σ2).