reproducibilityindex.ai

Sketched Ridgeless Linear Regression: The Role of Downsampling

Authors: Xin Chen, Yicheng Zeng, Siyue Yang, Qiang Sun

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical studies strongly support our theory. [...] Figure 1. Asymptotic risk curves for the ridgeless least square estimator. [...] Figure 2. Asymptotic risk curves for sketched ridgeless least square estimators with orthogonal and i.i.d. sketching under isotropic features. [...] We conducted numerical studies with 500 replications.
Researcher Affiliation	Academia	1Department of Operations Research and Financial Engineering, Princeton University, 98 Charlton St, Princeton, NJ 08544, USA. 2Shenzhen Research Institute of Big Data, the Chinese University of Hong Kong, 2001 Longxiang Boulevard, Shenzhen, Guangdong, China. 3Department of Statistical Sciences, University of Toronto, 700 University Ave, Toronto, ON M5G 1X6, Canada.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Our implementation is available at https://github.com/statsle/SRLR_python.
Open Datasets	No	The paper uses synthetically generated data rather than a named, publicly available dataset. For example: "Each row of the feature matrix X Rn p is i.i.d. drawn from N(0, Ip)." and "Each row of X Rn p is i.i.d. drawn from Np(0, Σ)".
Dataset Splits	Yes	For each replication, we generated β Np(0, α2 p Ip) and created a training dataset (X, Y ) with n = 400 training samples, a validation dataset {(xval,i, yval,i) : 1 i nval} with nval = {20, 100, 200} validation samples, and a testing dataset {(xnew,i, ynew,i) : 1 i nnew} with nnew = 100 testing samples.
Hardware Specification	Yes	We conducted timing experiments on a Mac Mini with an Apple M1 processor and 16GB of memory to measure the computational time required for the full-sample (no sketching) and sketched ridgeless least square estimators with orthogonal sketching (implemented through the subsampled randomized Hadamard transform) under isotropic features.
Software Dependencies	No	The paper mentions using Python libraries like Num Py and Sci Py, but it does not specify exact version numbers for these or any other key software dependencies required for reproducibility.
Experiment Setup	Yes	We conducted numerical studies with 500 replications. For each replication, we generated β Np(0, α2 p Ip) and created a training dataset (X, Y ) with n = 400 training samples... [...] We set SNR = α/σ = 1, 2, 3 with (α, σ) taking (5, 5), (10, 5) and (15, 5), respectively. [...] we varied ψ by taking a grid of ψ (0, 1) with \|ψi ψi+1\| = δ for δ = 0.05.