Explicit Inductive Bias for Transfer Learning with Convolutional Networks
Authors: Xuhong LI, Yves Grandvalet, Franck Davoine
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the aforementioned parameter regularizers on several pairs of source and target tasks. We use Res Net (He et al. 2016) as our base network, since it has proven its wide applicability on transfer learning tasks. Conventionally, if the target task is also a classification task, the training process starts by replacing the last layer with a new one, randomly generated, whose size depends on the number of classes in the target task. |
| Researcher Affiliation | Academia | 1Sorbonne universit es, Universit e de technologie de Compi egne, CNRS, Heudiasyc, UMR 7253, Compi egne, France. |
| Pseudocode | No | The paper describes mathematical formulations of penalties but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, 'All the experiments are performed with Tensorflow (Abadi et al. 2015).' but does not provide specific access to its own implementation code. |
| Open Datasets | Yes | Image Net (Deng et al. 2009) for generic object recognition and Places 365 (Zhou et al. 2017) for scene classification. Likewise, we have three different databases related to three target problems: Caltech 256 (Griffin et al. 2007) contains different objects for generic object recognition; MIT Indoors 67 (Quattoni & Torralba 2009) consists of 67 indoor scene categories; Stanford Dogs 120 (Khosla et al. 2011) contains images of 120 breeds of dogs |
| Dataset Splits | Yes | Each target database is split into training and testing sets following the suggestion of their creators (see Table 1 for details). In addition, we consider two configurations for Caltech 256: 30 or 60 examples randomly drawn from each category for training, and 20 remaining examples for test. Cross validation is used for searching the best regularization hyperparameters α and β |
| Hardware Specification | No | We acknowledge the support of NVIDIA Corporation with the donation of GPUs used for this research. |
| Software Dependencies | No | All the experiments are performed with Tensorflow (Abadi et al. 2015). |
| Experiment Setup | Yes | Stochastic gradient descent with momentum 0.9 is used for optimization. We run 9000 iterations and divide the learning rate by 10 after 6000 iterations. The initial learning rates are 0.005, 0.01 or 0.02, depending on the tasks. Batch size is 64. |