Incremental Multi-Domain Learning with Network Latent Tensor Factorization
Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic10470-10477
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the proposed method to the 10 datasets of the Visual Decathlon Challenge and show that our method offers on average about 7.5 reduction in number of parameters and competitive performance in terms of both classification accuracy and Decathlon score. |
| Researcher Affiliation | Collaboration | Adrian Bulat, ,1 Jean Kossaifi, ,1,2 Georgios Tzimiropoulos,1 Maja Pantic1,2 1Samsung AI Center Cambridge 2Imperial College London |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using PyTorch and TensorLy, which are third-party libraries. |
| Open Datasets | Yes | We evaluate our method on the 10 different datasets from very different visual domains that compose the Decathlon challenge (Rebuffi, Bilen, and Vedaldi 2017). Note that this dataset where modified in (Rebuffi, Bilen, and Vedaldi 2017), mainly by resizing and cropping them to the same resolution (72 72px). This challenge assesses explicitly methods designed to solve incremental multi-domain learning without catastrophic forgetting. Imagenet (Russakovsky et al. 2015) contains 1.2 millions images distributed across 1000 classes. Following (Rebuffi, Bilen, and Vedaldi 2017; 2018; Rosenfeld and Tsotsos 2017), this was used as the source domain to train the shared low-rank manifold for our model as detailed in Eq. (2). The FGVCAircraft Benchmark (Airc.) (Maji et al. 2013) contains 10,000 aircraft images across 100 different classes; CIFAR100 (C100) (Krizhevsky and Hinton 2009) is composed of 60000 small images in 100 classes; Daimler Mono Pedestrian Classification Benchmark (DPed) (Munder and Gavrila 2006) is a dataset for pedestrian detection (binary classification) composed of 50,000 images; Describable Texture Dataset (DTD) (Cimpoi et al. 2014) contains 5640 images, for 47 texture categories; the German Traffic Sign Recognition (GTSR) Benchmark (Stallkamp et al. 2012) is a dataset of 50, 000 images of 43 traffic sign categories; Flowers102 (Flwr) (Nilsback and Zisserman 2008) contains 102 flower categories with between 40 and 258 images per class; Omniglot (OGlt) (Lake, Salakhutdinov, and Tenenbaum 2015) is a dataset of 32000 images representing 1623 handwritten characters from 50 different alphabets; the Street View House Numbers (SVHN) (Netzer et al. 2011) is a digit recognition dataset containing 70000 images in 10 classes. Finally, UCF101 (UCF) (Soomro, Zamir, and Shah 2012) is an action recognition dataset composed of 13,320 images representing 101 action classes. |
| Dataset Splits | Yes | We evaluate our method on the 10 different datasets from very different visual domains that compose the Decathlon challenge (Rebuffi, Bilen, and Vedaldi 2017). Note that this dataset where modified in (Rebuffi, Bilen, and Vedaldi 2017), mainly by resizing and cropping them to the same resolution (72 72px). The regularization parameter λ was validated on a small validation set. Table 2: Mean Top-1 accuracy (%) on the unseen validation set, reported for two settings |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Py Torch and Tensor Ly, but does not specify their version numbers (e.g., 'PyTorch 1.9'). |
| Experiment Setup | Yes | We first train our adapted Res Net-26 model on Image Net for 90 epochs using SGD with momentum (0.9), using a learning rate of 0.1 that is decreased in steps by 10 every 30 epochs. To avoid overfitting, we use a weight decay equal to 10 5. During training, we follow the best practices and randomly apply scale jittering, random cropping and flipping. We initialize our weights from a normal distribution N(0, 0.002), before decomposing them using Tucker decomposition (Section 3). Finally, we train the obtained core and factors (via backpropagation) by reconstructing the weights on the fly. For the remaining 9 domains, we load the taskindependent core and the factors trained on imagenet, freeze the core weights and only fine-tune the factors, batch-norm layers and the two 1 1 projection layers, all of which account for 3.5% of the total number of parameters in total. The linear layer at the end of the network is trained from scratch for each task and was initialized from a uniform distribution. Depending on the size of the dataset, we adjust the weight decay to avoid overfitting (10 5 for larger datasets) and up to 0.005 for the smaller ones (e.g. Flowers102). |