reproducibilityindex.ai

Incremental Multi-Domain Learning with Network Latent Tensor Factorization

Authors: Adrian Bulat, Jean Kossaifi, Georgios Tzimiropoulos, Maja Pantic10470-10477

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply the proposed method to the 10 datasets of the Visual Decathlon Challenge and show that our method offers on average about 7.5 reduction in number of parameters and competitive performance in terms of both classiﬁcation accuracy and Decathlon score.
Researcher Affiliation	Collaboration	Adrian Bulat, ,1 Jean Kossaiﬁ, ,1,2 Georgios Tzimiropoulos,1 Maja Pantic1,2 1Samsung AI Center Cambridge 2Imperial College London
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using PyTorch and TensorLy, which are third-party libraries.
Open Datasets	Yes	We evaluate our method on the 10 different datasets from very different visual domains that compose the Decathlon challenge (Rebufﬁ, Bilen, and Vedaldi 2017). Note that this dataset where modiﬁed in (Rebufﬁ, Bilen, and Vedaldi 2017), mainly by resizing and cropping them to the same resolution (72 72px). This challenge assesses explicitly methods designed to solve incremental multi-domain learning without catastrophic forgetting. Imagenet (Russakovsky et al. 2015) contains 1.2 millions images distributed across 1000 classes. Following (Rebufﬁ, Bilen, and Vedaldi 2017; 2018; Rosenfeld and Tsotsos 2017), this was used as the source domain to train the shared low-rank manifold for our model as detailed in Eq. (2). The FGVCAircraft Benchmark (Airc.) (Maji et al. 2013) contains 10,000 aircraft images across 100 different classes; CIFAR100 (C100) (Krizhevsky and Hinton 2009) is composed of 60000 small images in 100 classes; Daimler Mono Pedestrian Classiﬁcation Benchmark (DPed) (Munder and Gavrila 2006) is a dataset for pedestrian detection (binary classiﬁcation) composed of 50,000 images; Describable Texture Dataset (DTD) (Cimpoi et al. 2014) contains 5640 images, for 47 texture categories; the German Trafﬁc Sign Recognition (GTSR) Benchmark (Stallkamp et al. 2012) is a dataset of 50, 000 images of 43 trafﬁc sign categories; Flowers102 (Flwr) (Nilsback and Zisserman 2008) contains 102 ﬂower categories with between 40 and 258 images per class; Omniglot (OGlt) (Lake, Salakhutdinov, and Tenenbaum 2015) is a dataset of 32000 images representing 1623 handwritten characters from 50 different alphabets; the Street View House Numbers (SVHN) (Netzer et al. 2011) is a digit recognition dataset containing 70000 images in 10 classes. Finally, UCF101 (UCF) (Soomro, Zamir, and Shah 2012) is an action recognition dataset composed of 13,320 images representing 101 action classes.
Dataset Splits	Yes	We evaluate our method on the 10 different datasets from very different visual domains that compose the Decathlon challenge (Rebufﬁ, Bilen, and Vedaldi 2017). Note that this dataset where modiﬁed in (Rebufﬁ, Bilen, and Vedaldi 2017), mainly by resizing and cropping them to the same resolution (72 72px). The regularization parameter λ was validated on a small validation set. Table 2: Mean Top-1 accuracy (%) on the unseen validation set, reported for two settings
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using Py Torch and Tensor Ly, but does not specify their version numbers (e.g., 'PyTorch 1.9').
Experiment Setup	Yes	We first train our adapted Res Net-26 model on Image Net for 90 epochs using SGD with momentum (0.9), using a learning rate of 0.1 that is decreased in steps by 10 every 30 epochs. To avoid overﬁtting, we use a weight decay equal to 10 5. During training, we follow the best practices and randomly apply scale jittering, random cropping and ﬂipping. We initialize our weights from a normal distribution N(0, 0.002), before decomposing them using Tucker decomposition (Section 3). Finally, we train the obtained core and factors (via backpropagation) by reconstructing the weights on the ﬂy. For the remaining 9 domains, we load the taskindependent core and the factors trained on imagenet, freeze the core weights and only ﬁne-tune the factors, batch-norm layers and the two 1 1 projection layers, all of which account for 3.5% of the total number of parameters in total. The linear layer at the end of the network is trained from scratch for each task and was initialized from a uniform distribution. Depending on the size of the dataset, we adjust the weight decay to avoid overﬁtting (10 5 for larger datasets) and up to 0.005 for the smaller ones (e.g. Flowers102).