Loss Landscape Characterization of Neural Networks without Over-Parametrization

Authors: Rustem Islamov, Niccolò Ajroldi, Antonio Orvieto, Aurelien Lucchi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate the soundness of our new function class through both theoretical analysis and empirical experimentation across a diverse range of deep learning models. and 5 Experimental validation of the α-β-condition
Researcher Affiliation Academia Rustem Islamov1 Niccoló Ajroldi2 Antonio Orvieto2,3,4 Aurelien Lucchi1 1University of Basel 2Max Planck Institute for Intelligent Systems 3 ELLIS Institute Tübingen 4Tübingen AI Center
Pseudocode Yes Algorithm 1 SGD with constant stepsize, Algorithm 2 SPSmax: Stochastic Polyak Stepsize, Algorithm 3 NGN: Non-negative Gauss Newton
Open Source Code No The paper refers to using and modifying existing open-source code but does not explicitly state that its own modified code or new code for this work is provided as open-source.
Open Datasets Yes Fashion-MNIST [83] dataset., CIFAR10 dataset [41], CIFAR100 [41], Criteo 1TB dataset [44], Fast MRI dataset [85], OGBG dataset [31], WMT dataset [9], Pythia models [8], Slim Pajama [72] dataset.
Dataset Splits No The paper refers to using standard datasets but does not explicitly provide the training, validation, or test dataset splits (e.g., percentages or counts) or cite a specific split methodology used for reproducibility.
Hardware Specification Yes LSTM, MLP, CNN and Resnet experiments are performed using one NVIDIA Ge Force RTX 3090 GPU with a memory of 24 GB. For training Algoperf and Pythia language models, we resort instead to 4x A100-SXM4 GPUs, with a memory of 40 GB each, and employ data parallelism for efficient distributed training.
Software Dependencies No The paper mentions using 'Py Torch [65] package' but does not provide specific version numbers for it or any other key software dependencies required for replication.
Experiment Setup Yes We train the model in all cases with fixed learning rate 0.09 for 1500 epochs and batch size 64 on Fashion-MNIST [83] dataset. and Table 3: Training details of large models from Appendix D.8 and Appendix D.9