reproducibilityindex.ai

Distance-Based Regularisation of Deep Networks for Fine-Tuning

Authors: Henry Gouk, Timothy Hospedales, Massimiliano Pontil

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art ﬁne-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space. 5 EXPERIMENTS This section provides an empirical investigation into the predictive performance of the proposed methods relative to existing approaches for regularising ﬁne-tuning, and also conducts experiments to demonstrate which properties of the novel algorithms are responsible for the change in performance.
Researcher Affiliation	Academia	Henry Gouk & Timothy M. Hospedales School of Informatics University of Edinburgh {henry.gouk,t.hospedales}@ed.ac.uk Massimiliano Pontil CSML, Istituto Italiano di Tecnologia & Department of Computer Science, UCL massimiliano.pontil@iit.it
Pseudocode	Yes	We provide pseudocode in the supplementary material that illustrates how these projections are integrated to the neural network ﬁne-tuning procedure when using a variant of the stochastic subgradient method.
Open Source Code	Yes	Implementations of the methods used in this paper are available online.1 1https://github.com/henrygouk/mars-finetuning
Open Datasets	Yes	Both networks are pre-trained on the 2012 Image Net Large Scale Visual Recognition Challenge dataset (Russakovsky et al., 2015). We perform an empirical comparison of our two bounds, along with a bound based on the spectral norm (Long and Sedghi, 2019), to demonstrate the relative tightness. This is done by training neural networks on the MNIST dataset (Le Cun et al., 1998).
Dataset Splits	No	The paper mentions 'training and validation folds' and 'Test set accuracy' but does not specify the exact percentages, sample counts, or methods for creating these splits in the main text.
Hardware Specification	No	No specific hardware details (e.g., GPU model, CPU model, memory) were mentioned for running the experiments.
Software Dependencies	No	The paper mentions using the 'Adam optimiser (Kingma and Ba, 2015)' but does not provide version numbers for any software dependencies like Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup	Yes	The Adam optimiser is used for all experiments (Kingma and Ba, 2015). Information regarding the datasets and hyperparameter optimisation procedure can be found in the supplemental material. Hyperparameters were generated according to λj = cˆλj and γj = cˆγj, where c is varied, and ˆλj, ˆγj are the values found during the hyperparameter optimisation process. The network is trained for 15 epochs using the Adam optimiser (Kingma and Ba, 2015), as training any further does not result in any performance increase. The ﬁne-tuning process is repeated ﬁve times with different random seeds to measure the robustness of each method to the composition of minibatches and initialisation of the ﬁnal linear layer, which is trained from scratch.