reproducibilityindex.ai

Layer-wise linear mode connectivity

Authors: Linara Adilova, Maksym Andriushchenko, Michael Kamp, Asja Fischer, Martin Jaggi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on our empirical and theoretical investigation, we introduce a novel notion of the layer-wise linear connectivity, and show that deep networks do not have layer-wise barriers between them.
Researcher Affiliation	Academia	Linara Adilova Ruhr University Bochum, EPFL Maksym Andriushchenko EPFL Michael Kamp IKIM UK Essen, RUB, and Monash University Asja Fischer Ruhr University Bochum Martin Jaggi EPFL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code for the experiments is published https://github.com/link-er/layer-wise-lmc.
Open Datasets	Yes	CIFAR-10 with Res Net18. We trained a small GPT-like model with 12 layers on Wikitext. We train two fully connected networks with 3 hidden layers on MNIST. Mobile Net trained on CIFAR-100. Domain Net dataset (Peng et al., 2019).
Dataset Splits	No	The paper mentions 'training set' and 'test set' but does not explicitly describe a separate 'validation' set or its specific split details for reproduction.
Hardware Specification	No	The paper does not specify the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper refers to training setups and implementations taken from specific GitHub repositories (e.g., 'https://github.com/epfml/llm-baselines', 'https://github.com/jhoon-oh/FedBABU'), but it does not explicitly list software dependencies with version numbers.
Experiment Setup	Yes	We train Res Net18 without normalization layers using warm-up learning rate schedule: starting from 0.0001 with linear mode for 100 epochs reaching 0.05. Afterwards cosine annealing is used as a schedule for learning rate decay. Batchsize is 64, training is happening for 200 epochs with SGD optimizer, momentum 0.9 and weight decay 5E 4. For VGG11 the training setup is the following: batch size 128, learning rate 0.05, with step wise learning rate scheduler multiplying learning rate by 0.5 every 30 steps. The training is performed for 200 epochs with SGD with momentum 0.9 and weight decay 5E 4. Mobile Net implementation and training hyperparameters were taken from https://github.com/jhoon-oh/FedBABU. In particular we use batchsize 128, learning rate 0.1 and decay it by 0.1 on half training and 0.75 of training. Training is done for 320 epochs.