Loss Landscape Characterization of Neural Networks without Over-Parametrization
Authors: Rustem Islamov, Niccolò Ajroldi, Antonio Orvieto, Aurelien Lucchi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate the soundness of our new function class through both theoretical analysis and empirical experimentation across a diverse range of deep learning models. and 5 Experimental validation of the α-β-condition |
| Researcher Affiliation | Academia | Rustem Islamov1 Niccoló Ajroldi2 Antonio Orvieto2,3,4 Aurelien Lucchi1 1University of Basel 2Max Planck Institute for Intelligent Systems 3 ELLIS Institute Tübingen 4Tübingen AI Center |
| Pseudocode | Yes | Algorithm 1 SGD with constant stepsize, Algorithm 2 SPSmax: Stochastic Polyak Stepsize, Algorithm 3 NGN: Non-negative Gauss Newton |
| Open Source Code | No | The paper refers to using and modifying existing open-source code but does not explicitly state that its own modified code or new code for this work is provided as open-source. |
| Open Datasets | Yes | Fashion-MNIST [83] dataset., CIFAR10 dataset [41], CIFAR100 [41], Criteo 1TB dataset [44], Fast MRI dataset [85], OGBG dataset [31], WMT dataset [9], Pythia models [8], Slim Pajama [72] dataset. |
| Dataset Splits | No | The paper refers to using standard datasets but does not explicitly provide the training, validation, or test dataset splits (e.g., percentages or counts) or cite a specific split methodology used for reproducibility. |
| Hardware Specification | Yes | LSTM, MLP, CNN and Resnet experiments are performed using one NVIDIA Ge Force RTX 3090 GPU with a memory of 24 GB. For training Algoperf and Pythia language models, we resort instead to 4x A100-SXM4 GPUs, with a memory of 40 GB each, and employ data parallelism for efficient distributed training. |
| Software Dependencies | No | The paper mentions using 'Py Torch [65] package' but does not provide specific version numbers for it or any other key software dependencies required for replication. |
| Experiment Setup | Yes | We train the model in all cases with fixed learning rate 0.09 for 1500 epochs and batch size 64 on Fashion-MNIST [83] dataset. and Table 3: Training details of large models from Appendix D.8 and Appendix D.9 |