Near-Optimal Linear Regression under Distribution Shift
Authors: Qi Lei, Wei Hu, Jason Lee
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments Our estimators are provably near optimal for the worst case β. However, it remains unknown whether on average they outperform other baselines. With synthetic data we explore the performances with random β. We are also interested to investigate the conditions when we win more. |
| Researcher Affiliation | Academia | 1Princeton University. Correspondence to: Qi Lei <qilei@princeton.edu>, Wei Hu <huwei@cs.princeton.edu>, Jason D. Lee <jasonlee@princeton.edu>. |
| Pseudocode | Yes | Our meta-algorithm. Our paper considers different settings with distribution shift. Our methods are unified under the following meta-algorithm: Step 1: Construct an unbiased sufficient statistic ˆβSS3 for the unknown parameter. Step 2: Construct ˆβMM, a linear function of the sufficient statistic ˆβSS that minimizes LB( ˆβMM). |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available or a link to a code repository. |
| Open Datasets | Yes | Experiments on Berkeley Yearbook Dataset To verify the performance of our algorithm on real-world data, we conduct an experiment on the Berkeley Yearbook dataset (Ginosar et al., 2015). |
| Dataset Splits | Yes | We also have an additional 200 labeled data from target domain as validation set only for hyper-parameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as programming languages, libraries, or solver names with version numbers, used to replicate the experiment. |
| Experiment Setup | Yes | We set n S = 2000, d = 50, σ = 1, r = √d. For each setting, we sample β T from standard normal distribution and rescale it to be norm r. We estimate ΣT by n U = 2000 unlabeled samples. ... We choose λi(ΣS) i, λi(ΣT ) 1/i, and the eigenspace for both ΣS and ΣT are random orthonormal matrices. ( ΣS 2 F = ΣT 2 F = d.) The ground truth model is a one-hidden-layer Re LU network: f (x) = 1/da (Wx)+, where W and a are randomly generated from standard Gaussian distribution. We observe noisy labels: y S = f (x) + z, where zi N(0, σ2). |