Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Penalising the biases in norm regularisation enforces sparsity
Authors: Etienne Boursier, Nicolas Flammarion
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the significance of bias term regularisation in achieving sparser estimators during neural network training is illustrated on toy examples in Section 6. This section compares, through Figure 3, the estimators that are obtained with and without counting the bias terms in the regularisation, when training a one-hidden Re LU layer neural network. |
| Researcher Affiliation | Academia | Etienne Boursier INRIA CELESTE, LMO, Orsay, France EMAIL Nicolas Flammarion TML Lab, EPFL, Switzerland EMAIL |
| Pseudocode | No | The paper contains mathematical derivations and proofs, but no structured pseudocode or algorithm blocks are present. |
| Open Source Code | Yes | The code is made available at github.com/eboursier/penalising_biases. |
| Open Datasets | No | The paper mentions using 'toy examples' for illustration, but it does not provide specific dataset names, citations, or links for public access. |
| Dataset Splits | No | The paper uses 'toy examples' for illustration and does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, or memory) used to run the experiments. |
| Software Dependencies | No | The paper discusses training neural networks but does not provide specific software names with version numbers (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | For this experiment, we train neural networks by minimising the empirical loss, regularised with the ℓ2 norm of the parameters (either with or without the bias terms) with a regularisation factor λ = 10−3. Each neural network has m = 200 hidden neurons and all parameters are initialised i.i.d. as centered Gaussian variables of variance 1/√m (similar results are observed for larger initialisation scales). |