On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression
Authors: Denny Wu, Ji Xu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the results of Theorem 1 in Figure 2 (noiseless case) and Figure 8 (noisy case) for both discrete and continuous design for dx and dβ with Σx = diag(dx), Σβ = diag(dβ) and Σw = I (see design details in Appendix D). In Figure 2, we plot the prediction risk of all three joint relations defined above (see Appendix D for details). In Figure 4 we confirm our findings in Theorem 4 (for additional results on different distributions see Figure 10). Specifically, we set Σw = I, Σx = diag(dx) and Σβ = Σα x. Theorem 10 is supported by Figure 6, where we plot the prediction risk of the generalized ridge regression estimator under different Σw and optimally tuned λopt. We demonstrate the effectiveness of this heuristic in Figure 7. |
| Researcher Affiliation | Academia | Denny Wu University of Toronto and Vector Institute dennywu@cs.toronto.edu Ji Xu Columbia University jixu@cs.columbia.edu |
| Pseudocode | No | The paper focuses on mathematical derivations and analysis, and does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention releasing any source code or provide any links to a code repository for the described methodology. |
| Open Datasets | No | The paper analyzes a linear model with a random design setting and general data covariance, and conducts simulations to verify theoretical results. It does not use or provide access information for a publicly available or open dataset in the traditional sense, as the 'data' for the experiments is generated based on specified parameters. |
| Dataset Splits | No | The paper focuses on theoretical analysis and simulations based on defined parameters (e.g., p/n = gamma, n=300, p=600). It does not specify traditional training, validation, or test dataset splits, as these concepts are typically applied to pre-existing datasets rather than parameters for data generation. |
| Hardware Specification | No | The paper does not mention any specific hardware (e.g., CPU, GPU models, or cloud resources) used for running its simulations or experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, specific libraries or solvers) used for its analysis or simulations. |
| Experiment Setup | Yes | Figure 2: Finite sample prediction risk... We set γ = 2 and (n, p) = (300, 600). Figure 4: We set Σw = I and Σβ = Σα x where dx has two point masses on 1 and 5 with probability 3/4 and 1/4 respectively. Left: optimal λ; solid lines represents the noiseless case σ = 0 and dashed lines represents SNR ξ = 5. Figure 6: Left: dx to have 4 point masses (1, 2, 3, 4) with equal probabilities and dβ with 2 point masses on 1 and 5 with probabilities 3/4 and 1/4, respectively; Right: dx has 2 point masses on 1 and 5 with probabilities 3/4 and 1/4, respectively, and Σβ = Σ2 x; we set Σw = Σα β. Noiseless σ = 0. |