reproducibilityindex.ai

Near-Optimal Linear Regression under Distribution Shift

Authors: Qi Lei, Wei Hu, Jason Lee

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiments Our estimators are provably near optimal for the worst case β. However, it remains unknown whether on average they outperform other baselines. With synthetic data we explore the performances with random β. We are also interested to investigate the conditions when we win more.
Researcher Affiliation	Academia	1Princeton University. Correspondence to: Qi Lei <qilei@princeton.edu>, Wei Hu <huwei@cs.princeton.edu>, Jason D. Lee <jasonlee@princeton.edu>.
Pseudocode	Yes	Our meta-algorithm. Our paper considers different settings with distribution shift. Our methods are uniﬁed under the following meta-algorithm: Step 1: Construct an unbiased sufﬁcient statistic ˆβSS3 for the unknown parameter. Step 2: Construct ˆβMM, a linear function of the sufﬁcient statistic ˆβSS that minimizes LB( ˆβMM).
Open Source Code	No	The paper does not contain any statement about making its source code publicly available or a link to a code repository.
Open Datasets	Yes	Experiments on Berkeley Yearbook Dataset To verify the performance of our algorithm on real-world data, we conduct an experiment on the Berkeley Yearbook dataset (Ginosar et al., 2015).
Dataset Splits	Yes	We also have an additional 200 labeled data from target domain as validation set only for hyper-parameter tuning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as programming languages, libraries, or solver names with version numbers, used to replicate the experiment.
Experiment Setup	Yes	We set n S = 2000, d = 50, σ = 1, r = √d. For each setting, we sample β T from standard normal distribution and rescale it to be norm r. We estimate ΣT by n U = 2000 unlabeled samples. ... We choose λi(ΣS) i, λi(ΣT ) 1/i, and the eigenspace for both ΣS and ΣT are random orthonormal matrices. ( ΣS 2 F = ΣT 2 F = d.) The ground truth model is a one-hidden-layer Re LU network: f (x) = 1/da (Wx)+, where W and a are randomly generated from standard Gaussian distribution. We observe noisy labels: y S = f (x) + z, where zi N(0, σ2).