Learning What and Where to Transfer

Authors: Yunhun Jang, Hankook Lee, Sung Ju Hwang, Jinwoo Shin

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our meta-transfer approach against recent transfer learning methods on various datasets and network architectures, on which our automated scheme significantly outperforms the prior baselines that find what and where to transfer in a hand-crafted manner. Section 3 shows our experimental results under various settings
Researcher Affiliation Collaboration 1School of Electrical Engineering, KAIST, Korea 2OMNIOUS, Korea 3School of Computing, KAIST, Korea 4Graduate School of AI, KAIST, Korea 5AITRICS, Korea.
Pseudocode Yes Algorithm 1 Learning of θ with meta-parameters φ
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes For 32 32 scale, we use the Tiny Image Net1 dataset as a source task, and CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), and STL-10 (Coates et al., 2011) datasets as target tasks. ... For 224 224 scale, the Image Net (Deng et al., 2009) dataset is used as a source dataset, and Caltech-UCSD Bird 200 (Wah et al., 2011), MIT Indoor Scene Recognition (Quattoni & Torralba, 2009), Stanford 40 Actions (Yao et al., 2011) and Stanford Dogs (Khosla et al., 2011) datasets as target tasks.
Dataset Splits No The paper mentions using N training samples per class for CIFAR-10 but does not specify clear training/validation/test splits, beyond implicitly using the remainder of the dataset for testing if not explicitly stated.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Our final loss Ltotal to train a target model then is given as: Ltotal(θ|x, y, φ) = Lorg(θ|x, y) + βLwfm(θ|x, φ). where Lorg is the original loss (e.g., cross entropy) and β > 0 is a hyper-parameter. We choose T = 2 in our experiments and Algorithm 1 Learning of θ with meta-parameters φ Input: Dataset Dtrain = {(xi, yi)}, learning rate α repeat Sample a batch B Dtrain with |B| = B. For all experiments, we construct the meta-networks as 1-layer fully-connected networks for each pair (m, n) C where C is the set of candidates of pairs, or matching configuration (see Figure 3). It takes the globally average pooled features of the mth layer of the source network as an input, and outputs wm,n c and λm,n. As for the channel assignments w, we use the softmax activation to generate them while satisfying P c wm,n c = 1, and for transfer amount λ between layers, we commonly use Re LU6 (Krizhevsky & Hinton, 2010), max(0, min(6, x)) to ensure non-negativeness of λ and to prevent λm,n from becoming too large.