reproducibilityindex.ai

Penalising the biases in norm regularisation enforces sparsity

Authors: Etienne Boursier, Nicolas Flammarion

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, the significance of bias term regularisation in achieving sparser estimators during neural network training is illustrated on toy examples in Section 6. This section compares, through Figure 3, the estimators that are obtained with and without counting the bias terms in the regularisation, when training a one-hidden Re LU layer neural network.
Researcher Affiliation	Academia	Etienne Boursier INRIA CELESTE, LMO, Orsay, France etienne.boursier@inria.fr Nicolas Flammarion TML Lab, EPFL, Switzerland nicolas.flammarion@epfl.ch
Pseudocode	No	The paper contains mathematical derivations and proofs, but no structured pseudocode or algorithm blocks are present.
Open Source Code	Yes	The code is made available at github.com/eboursier/penalising_biases.
Open Datasets	No	The paper mentions using 'toy examples' for illustration, but it does not provide specific dataset names, citations, or links for public access.
Dataset Splits	No	The paper uses 'toy examples' for illustration and does not provide specific details on dataset splits (e.g., train/validation/test percentages or counts).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, or memory) used to run the experiments.
Software Dependencies	No	The paper discusses training neural networks but does not provide specific software names with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	For this experiment, we train neural networks by minimising the empirical loss, regularised with the ℓ2 norm of the parameters (either with or without the bias terms) with a regularisation factor λ = 10−3. Each neural network has m = 200 hidden neurons and all parameters are initialised i.i.d. as centered Gaussian variables of variance 1/√m (similar results are observed for larger initialisation scales).