No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

Authors: Charles Guille-Escuret, Hiroki Naganuma, Kilian Fatras, Ioannis Mitliagkas

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct our experiments on image classification, semantic segmentation and language modeling across different batch sizes, network architectures, datasets, optimizers, and initialization seeds. We discuss the impact of each factor.
Researcher Affiliation Collaboration 1Mila, Montreal, Canada 2Université de Montréal Montreal, Canada 3University of Mc Gill, Montreal, Canada 4Dreamfold 5Archimedes Unit, Athena Research Center, Athens.
Pseudocode Yes A detailed description of our experimental protocol is provided in Algorithm 1 in Appendix A.1, and we share our code at https://github. com/Hiroki11x/Loss Landscape Geometry.
Open Source Code Yes Our code can be found at the link below. https://github.com/Hiroki11x/ Loss Landscape Geometry
Open Datasets Yes The CIFAR-10 dataset (Krizhevsky et al., 2012), one of the most widely used datasets for machine learning research... Image Net-1K (Deng et al., 2009)... Wiki Text-2 dataset (Logan et al., 2019)... The Vaihingen dataset (Rottensteiner et al., 2012)
Dataset Splits Yes The dataset is split into two segments: a training set comprising 50,000 images and a test set of 10,000 images. (CIFAR-10) ... The dataset is divided into three segments: a training set with roughly 2.08 million tokens, a validation set with approximately 217,000 tokens, and a test set with about 245,000 tokens. (Wiki Text-2) ... It is composed of 33 tiles and we use 11 tiles for training, 5 tiles for validation, and the remaining 17 tiles for testing our model
Hardware Specification Yes For cluster A, each node is composed of NVIDIA A100 4GPU and AMD Milan 7413 @ 2.65 GHz 128M cache L3 2CPU.
Software Dependencies Yes As a software environment, we use Rocky Linux 8.7, gcc 9.3.0, Python 3.10.2, pytorch 1.13.1, torchvision 0.14.1, cu DNN 8.2.0, and CUDA 11.4.
Experiment Setup Yes The specifics concerning the batch size and the total number of epochs allocated for each dataset and corresponding model have been exhaustively tabulated in Table 1. ... Further, we present detailed settings of specific ablation experiments in Table 2, 3,4, and 5.