reproducibilityindex.ai

Topological obstruction to the training of shallow ReLU neural networks

Authors: Marco Nurisso, Pierrick Leroy, Francesco Vaccarino

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper reveals the presence of topological obstruction in the loss landscape of shallow Re LU neural networks trained using gradient flow. ... We validate this result with numerical experiments. ... 6 Empirical Validation
Researcher Affiliation	Academia	Marco Nurisso Politecnico di Torino & CENTAI Institute Torino, 10100 ITALY marco.nurisso@polito.it
Pseudocode	No	The paper does not contain pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The NeurIPS checklist within the paper states: 'Question: Does the paper provide open access to the data and code... Answer: [No] Justification: Given the simplicity of our setup, we think that releasing the code would be unnecessary.'
Open Datasets	Yes	Next, we generate a dataset of 8000 points (xi,F(xi)) by sampling xi U([0,1]2). ... We consider a simple binary classification task on the well-known breast cancer dataset [49]
Dataset Splits	No	The paper generates a dataset of 8000 points for the toy example and uses the breast cancer dataset, but it does not specify any explicit train/validation/test splits, percentages, or sample counts for these datasets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory) to run its experiments in the main text.
Software Dependencies	No	The paper mentions types of loss functions (MSE, BCE) and optimization (gradient descent) but does not provide specific software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that were used for the experiments.
Experiment Setup	Yes	Our model, depicted in Figure 3a) is a one hidden layer neural network with 2 hidden neurons, Re LU activations and no biases. All the weights are initialized by independently sampling from U([-2,2]). ... Finally, we train the network using gradient descent on the MSE loss with a small learning rate of h = 0.01.