DeepME: Deep Mixture Experts for Large-scale Image Classification

Authors: Ming He, Guangyi Lv, Weidong He, Jianping Fan, Guihua Zeng

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on Image Net10K have demonstrated that our proposed deep mixture algorithm can achieve very competitive results (top 1 accuracy: 32.13%) on large-scale image classification tasks. We performed experiments on Image Net10K, which is one of the most well-known image datasets for visual classification, and contains 10,184 image categories and 9M images. Furthermore, we use a 85%-5%-10% train/validation/test split.
Researcher Affiliation Collaboration Ming He1,2,4, Guangyi Lv2 , Weidong He2, Jianping Fan3, Guihua Zeng 1 1Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China 2University of Science and Technology of China, Hefei, Anhui 230000, China 3AI Lab at Lenovo Research, Beijing, China 4Didi Chuxing, Beijing, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes We use Image Net10K data set [Deng et al., 2009] with 10,184 image categories to evaluate our deep mixture algorithm on large-scale image classification. We performed experiments on Image Net10K, which is one of the most well-known image datasets for visual classification, and contains 10,184 image categories and 9M images.
Dataset Splits Yes Furthermore, we use a 85%-5%-10% train/validation/test split.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions optimizers like 'Stochastic Gradient Descent (SGD) with momentum 0.9' and initializers like 'Glorot Normal initializer,' but does not provide specific version numbers for any software libraries or dependencies used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes In Deep ME, the training process consists of two parts as follows: 1. Training of the base CNN. We utilize Stochastic Gradient Descent (SGD) with momentum 0.9 to learn the base network. The training is divided into two stages: 1) The warm-up stage and 2) The fine-tuning stage. In the warm-up stage, the learning rate is set from 0.01 to 0.001 in a exponentially decayed manner, while in the fine-tuning stage it is set from 0.001 to 0.00001. We use batch size 256 and L2 regularization for the corresponding parameters with weight 0.0005. 2. Training of the gate network. To initialized the parameters of the gate network, Glorot Normal initializer is adopted as suggested in [Orr and M uller, 2003]. SGD with momentum is also used to learn the network, and the batch size is 256. The initial learning rate is 0.001 and then exponentially decayed to 0.0001.