Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Choice of Normalization Influences Shrinkage in Regularized Regression
Authors: Johan Larsson, Jonas Wallin
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our key contributions are: 1. We reveal that class balance in binary features directly affects lasso, ridge, and elastic net estimates, and show that scaling binary features with standard deviation (ridge) or variance (lasso) mitigates these effects at the cost of increased variance. Through extensive empirical analysis, we show that this finding extends to a wide range of settings (Section 4). ... In the following sections we present the results of our experiments. For all simulated data we generate our response vector according to y = Xβ + ε, with ε Normal(0, σ2 εI). |
| Researcher Affiliation | Academia | Johan Larsson EMAIL Department of Mathematical Sciences, University of Copenhagen Department of Statistics, Lund University Jonas Wallin EMAIL Department of Statistics, Lund University |
| Pseudocode | No | We use a coordinate solver from Lasso.jl (Kornblith, 2024) to optimize our models, which we have based on the algorithm outlined by Friedman et al. (2010). |
| Open Source Code | Yes | All experiments were coded using the Julia programming language (Bezanson et al., 2017) and the code is available at https://github.com/jolars/normreg. |
| Open Datasets | Yes | Figure 1: Lasso paths for real datasets using two types of normalization: standardization and maximum absolute value normalization (max abs). For each dataset, we have colored the coefficients if they were among the first five to become non-zero under either of the two normalization schemes. The x-axis shows the steps along the regularization path and the y-axis the estimated coefficients normalized by the maximum magnitude of the coefficients in each case. See Section G for more information about datasets used here. ... Table 5: Details of the real datasets used in the experiments. The median q value refers to the median of the proportion of ones for the binary features in the data. |
| Dataset Splits | Yes | To illustrate this further, we show the estimated coefficients for the same datasets after having fitted the lasso with a penalty strength (λ) set by 5-fold cross-validation repeated 5 times on a 50% training data subset. ... We use standard hold-out validation with equal splits for training, validation, and test sets. And we fit a full lasso path, parameterized by a log-spaced grid of 100 values9, from λmax (the value of λ at which the first feature enters the model) to 10 2λmax on the training set and pick λ based on validation set error. |
| Hardware Specification | No | No specific hardware details are provided for the experimental setup. The paper mentions the use of Julia programming language for coding experiments but does not specify any hardware. |
| Software Dependencies | No | All experiments were coded using the Julia programming language (Bezanson et al., 2017) and the code is available at https://github.com/jolars/normreg. We use a coordinate solver from Lasso.jl (Kornblith, 2024) to optimize our models, which we have based on the algorithm outlined by Friedman et al. (2010). |
| Experiment Setup | Yes | The first 20 features correspond to signals, with β j = 1, and otherwise we set β j to 0. Furthermore, we set the class balance of the first 20 features so that it increases geometrically from 0.5 to 0.99. For all other features we pick qj uniformly at random in [0.5, 0.99]. We estimate the regression coefficients using the lasso, setting λ1 = 2σε 2 log p, with σε set to achieve a signal-to-noise ratio (SNR) of 2. ... In order to preserve the comparability for the baseline case when we have perfect class balance, we scale by sj = 2 (1/4)1 δ(qj q2 j )δ. Finally, we set λ to λmax/2 and 2λmax for lasso and ridge regression respectively. |