reproducibilityindex.ai

Does a sparse ReLU network training problem always admit an optimum ?

Authors: TUNG LE, Remi Gribonval, Elisa Riccietti

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1 illustrates the behavior of the relative errors of the training set, validation set and the sum of weight matrices norm along epochs, using Stochastic Gradient Descent (SGD) with batch size 3000, learning rate 0.1, momentum 0.9 and four different weight decays (the hyperparameter controlling the L2 regularizer) λ {0, 10 4, 5 10 4, 10 3}. The case λ = 0 corresponds to the unregularized case. Our training and testing sets contain each P = 105 samples generated independently as xi U([ 1, 1]d) (d = 100) and yi := Axi.We test this algorithm on a one-hidden layer Re LU network with two 100 100 weight matrices.
Researcher Affiliation	Academia	Univ. Lyon, Inria, CNRS, ENS de Lyon, UCB Lyon 1, LIP UMR 5668, F-69342 Lyon, France
Pseudocode	No	The paper describes algorithms verbally (e.g., quantifier elimination, detection algorithm) but does not present them in structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code for reproducible research: Does a sparse Re LU network training problemalways admit an optimum? Code repository available at https://hal.science/hal-04233925, October 2023.
Open Datasets	No	Our training and testing sets contain each P = 105 samples generated independently as xi U([ 1, 1]d) (d = 100) and yi := Axi. The paper describes generating its own synthetic dataset and does not provide a link, DOI, or formal citation for a publicly available or open dataset.
Dataset Splits	No	Our training and testing sets contain each P = 105 samples generated independently as xi U([ 1, 1]d) (d = 100) and yi := Axi. The paper mentions training and testing sets but does not specify a validation set or provide percentages for dataset splits.
Hardware Specification	No	The authors thank the Blaise Pascal Center (CBP) for the computational means. It uses the SIDUS [27] solution developed by Emmanuel Quemener. The paper mentions general "computational means" but does not provide specific hardware details such as GPU/CPU models, memory, or other detailed computer specifications used for the experiments.
Software Dependencies	No	small toy examples (for example, Example 3.1 with d = 2) can be veriﬁed using Z3Prover1, a software implementing exactly the algorithm in Lemma 3.3. The paper mentions "Z3Prover" but does not specify a version number for it or any other software dependencies.
Experiment Setup	Yes	Figure 1 illustrates the behavior of the relative errors of the training set, validation set and the sum of weight matrices norm along epochs, using Stochastic Gradient Descent (SGD) with batch size 3000, learning rate 0.1, momentum 0.9 and four different weight decays (the hyperparameter controlling the L2 regularizer) λ {0, 10 4, 5 10 4, 10 3}.