Topological obstruction to the training of shallow ReLU neural networks

Authors: Marco Nurisso, Pierrick Leroy, Francesco Vaccarino

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper reveals the presence of topological obstruction in the loss landscape of shallow Re LU neural networks trained using gradient flow. ... We validate this result with numerical experiments. ... 6 Empirical Validation
Researcher Affiliation Academia Marco Nurisso Politecnico di Torino & CENTAI Institute Torino, 10100 ITALY marco.nurisso@polito.it
Pseudocode No The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code No The NeurIPS checklist within the paper states: 'Question: Does the paper provide open access to the data and code... Answer: [No] Justification: Given the simplicity of our setup, we think that releasing the code would be unnecessary.'
Open Datasets Yes Next, we generate a dataset of 8000 points (xi,F(xi)) by sampling xi U([0,1]2). ... We consider a simple binary classification task on the well-known breast cancer dataset [49]
Dataset Splits No The paper generates a dataset of 8000 points for the toy example and uses the breast cancer dataset, but it does not specify any explicit train/validation/test splits, percentages, or sample counts for these datasets.
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments in the main text.
Software Dependencies No The paper mentions types of loss functions (MSE, BCE) and optimization (gradient descent) but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that were used for the experiments.
Experiment Setup Yes Our model, depicted in Figure 3a) is a one hidden layer neural network with 2 hidden neurons, Re LU activations and no biases. All the weights are initialized by independently sampling from U([-2,2]). ... Finally, we train the network using gradient descent on the MSE loss with a small learning rate of h = 0.01.