Learning What and Where to Transfer
Authors: Yunhun Jang, Hankook Lee, Sung Ju Hwang, Jinwoo Shin
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our meta-transfer approach against recent transfer learning methods on various datasets and network architectures, on which our automated scheme significantly outperforms the prior baselines that find what and where to transfer in a hand-crafted manner. Section 3 shows our experimental results under various settings |
| Researcher Affiliation | Collaboration | 1School of Electrical Engineering, KAIST, Korea 2OMNIOUS, Korea 3School of Computing, KAIST, Korea 4Graduate School of AI, KAIST, Korea 5AITRICS, Korea. |
| Pseudocode | Yes | Algorithm 1 Learning of θ with meta-parameters φ |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | For 32 32 scale, we use the Tiny Image Net1 dataset as a source task, and CIFAR-10, CIFAR-100 (Krizhevsky & Hinton, 2009), and STL-10 (Coates et al., 2011) datasets as target tasks. ... For 224 224 scale, the Image Net (Deng et al., 2009) dataset is used as a source dataset, and Caltech-UCSD Bird 200 (Wah et al., 2011), MIT Indoor Scene Recognition (Quattoni & Torralba, 2009), Stanford 40 Actions (Yao et al., 2011) and Stanford Dogs (Khosla et al., 2011) datasets as target tasks. |
| Dataset Splits | No | The paper mentions using N training samples per class for CIFAR-10 but does not specify clear training/validation/test splits, beyond implicitly using the remainder of the dataset for testing if not explicitly stated. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Our final loss Ltotal to train a target model then is given as: Ltotal(θ|x, y, φ) = Lorg(θ|x, y) + βLwfm(θ|x, φ). where Lorg is the original loss (e.g., cross entropy) and β > 0 is a hyper-parameter. We choose T = 2 in our experiments and Algorithm 1 Learning of θ with meta-parameters φ Input: Dataset Dtrain = {(xi, yi)}, learning rate α repeat Sample a batch B Dtrain with |B| = B. For all experiments, we construct the meta-networks as 1-layer fully-connected networks for each pair (m, n) C where C is the set of candidates of pairs, or matching configuration (see Figure 3). It takes the globally average pooled features of the mth layer of the source network as an input, and outputs wm,n c and λm,n. As for the channel assignments w, we use the softmax activation to generate them while satisfying P c wm,n c = 1, and for transfer amount λ between layers, we commonly use Re LU6 (Krizhevsky & Hinton, 2010), max(0, min(6, x)) to ensure non-negativeness of λ and to prevent λm,n from becoming too large. |