Distance-Based Regularisation of Deep Networks for Fine-Tuning

Authors: Henry Gouk, Timothy Hospedales, Massimiliano Pontil

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space. 5 EXPERIMENTS This section provides an empirical investigation into the predictive performance of the proposed methods relative to existing approaches for regularising fine-tuning, and also conducts experiments to demonstrate which properties of the novel algorithms are responsible for the change in performance.
Researcher Affiliation Academia Henry Gouk & Timothy M. Hospedales School of Informatics University of Edinburgh {henry.gouk,t.hospedales}@ed.ac.uk Massimiliano Pontil CSML, Istituto Italiano di Tecnologia & Department of Computer Science, UCL massimiliano.pontil@iit.it
Pseudocode Yes We provide pseudocode in the supplementary material that illustrates how these projections are integrated to the neural network fine-tuning procedure when using a variant of the stochastic subgradient method.
Open Source Code Yes Implementations of the methods used in this paper are available online.1 1https://github.com/henrygouk/mars-finetuning
Open Datasets Yes Both networks are pre-trained on the 2012 Image Net Large Scale Visual Recognition Challenge dataset (Russakovsky et al., 2015). We perform an empirical comparison of our two bounds, along with a bound based on the spectral norm (Long and Sedghi, 2019), to demonstrate the relative tightness. This is done by training neural networks on the MNIST dataset (Le Cun et al., 1998).
Dataset Splits No The paper mentions 'training and validation folds' and 'Test set accuracy' but does not specify the exact percentages, sample counts, or methods for creating these splits in the main text.
Hardware Specification No No specific hardware details (e.g., GPU model, CPU model, memory) were mentioned for running the experiments.
Software Dependencies No The paper mentions using the 'Adam optimiser (Kingma and Ba, 2015)' but does not provide version numbers for any software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes The Adam optimiser is used for all experiments (Kingma and Ba, 2015). Information regarding the datasets and hyperparameter optimisation procedure can be found in the supplemental material. Hyperparameters were generated according to λj = cˆλj and γj = cˆγj, where c is varied, and ˆλj, ˆγj are the values found during the hyperparameter optimisation process. The network is trained for 15 epochs using the Adam optimiser (Kingma and Ba, 2015), as training any further does not result in any performance increase. The fine-tuning process is repeated five times with different random seeds to measure the robustness of each method to the composition of minibatches and initialisation of the final linear layer, which is trained from scratch.