Adaptive Sharing for Image Classification

Authors: Li Shen, Gang Sun, Zhouchen Lin, Qingming Huang, Enhua Wu

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that our method can significantly improve the classification performance by transferring knowledge appropriately.
Researcher Affiliation Academia 1 University of Chinese Academy of Sciences, Beijing, China 2 State Key Lab. of Computer Science, Inst. of Software, CAS, Beijing, China 3 Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, Beijing, China 4 Cooperative Medianet Innovation Center, Shanghai, China 5 Key Lab. of Intell. Info. Process., Inst. of Comput. Tech., Chinese Academy of Sciences, China 6 University of Macau, Macao, China
Pseudocode No The paper describes algorithmic steps in paragraph form but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper states, "Our implementation is based on the publicly available code of Caffe [Jia et al., 2014].", but does not provide a link or explicit statement about releasing the source code for their proposed Adaptive Sharing method.
Open Datasets Yes The CUB-200-2010 (Birds200) is a widely used dataset for fine-grained classification (FGC). It contains 6,033 images of birds belonging to 200 species, where only 15 images per class are used for training, the rest are used for testing. The CIFAR-100 dataset [Krizhevsky, 2009] is composed of 32 32 color images belonging to 100 classes, with 50,000 images for training and 10,000 images for testing. We perform additional experiments on the Image Net 2012 classification dataset [Russakovsky et al., 2014], which is a challenging dataset with 1000 classes.
Dataset Splits Yes We obtain 57.7% on top-1 accuracy and 81.3% on top-5 accuracy on the validation set, where the improvements over the baseline are 0.6% and 1.1%, respectively.
Hardware Specification Yes Our model is trained on a single Tesla K40 GPU within two weeks.
Software Dependencies No The paper mentions using "Caffe [Jia et al., 2014]" but does not provide specific version numbers for Caffe or any other software dependencies.
Experiment Setup Yes We train the networks by applying stochastic gradient descent with a mini-batch size of 128 and a fixed momentum of 0.9. The training is regularized by weight decay (the l2 penalty factor is set to 0.004). Particularly, the parameter matrix S in Adaptive Sharing is regularized with l1 weight decay (the penalty factor is set to 0.0005). The learning rate is initialized to 0.001, is divided by 10 when the error plateaus.