No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths
Authors: Charles Guille-Escuret, Hiroki Naganuma, Kilian Fatras, Ioannis Mitliagkas
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct our experiments on image classification, semantic segmentation and language modeling across different batch sizes, network architectures, datasets, optimizers, and initialization seeds. We discuss the impact of each factor. |
| Researcher Affiliation | Collaboration | 1Mila, Montreal, Canada 2Université de Montréal Montreal, Canada 3University of Mc Gill, Montreal, Canada 4Dreamfold 5Archimedes Unit, Athena Research Center, Athens. |
| Pseudocode | Yes | A detailed description of our experimental protocol is provided in Algorithm 1 in Appendix A.1, and we share our code at https://github. com/Hiroki11x/Loss Landscape Geometry. |
| Open Source Code | Yes | Our code can be found at the link below. https://github.com/Hiroki11x/ Loss Landscape Geometry |
| Open Datasets | Yes | The CIFAR-10 dataset (Krizhevsky et al., 2012), one of the most widely used datasets for machine learning research... Image Net-1K (Deng et al., 2009)... Wiki Text-2 dataset (Logan et al., 2019)... The Vaihingen dataset (Rottensteiner et al., 2012) |
| Dataset Splits | Yes | The dataset is split into two segments: a training set comprising 50,000 images and a test set of 10,000 images. (CIFAR-10) ... The dataset is divided into three segments: a training set with roughly 2.08 million tokens, a validation set with approximately 217,000 tokens, and a test set with about 245,000 tokens. (Wiki Text-2) ... It is composed of 33 tiles and we use 11 tiles for training, 5 tiles for validation, and the remaining 17 tiles for testing our model |
| Hardware Specification | Yes | For cluster A, each node is composed of NVIDIA A100 4GPU and AMD Milan 7413 @ 2.65 GHz 128M cache L3 2CPU. |
| Software Dependencies | Yes | As a software environment, we use Rocky Linux 8.7, gcc 9.3.0, Python 3.10.2, pytorch 1.13.1, torchvision 0.14.1, cu DNN 8.2.0, and CUDA 11.4. |
| Experiment Setup | Yes | The specifics concerning the batch size and the total number of epochs allocated for each dataset and corresponding model have been exhaustively tabulated in Table 1. ... Further, we present detailed settings of specific ablation experiments in Table 2, 3,4, and 5. |