Accelerated Training for Massive Classification via Dynamic Class Selection

Authors: Xingcheng Zhang, Lei Yang, Junjie Yan, Dahua Lin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On several large-scale benchmarks, our method significantly reduces the training cost and memory demand, while maintaining competitive performance. We test our method on three benchmarks on face recognition/verification, which is the application that motivates this work. We not only compare it with various methods, but also investigate how different factors influence the performance and cost, via a series of ablation studies.
Researcher Affiliation Collaboration Xingcheng Zhang,1 Lei Yang,1 Junjie Yan,2 Dahua Lin1 1Department of Information Engineering, The Chinese University of Hong Kong 2Sense Time Group Limited
Pseudocode Yes Algorithm 1 Build Hashing Tree
Open Source Code No The paper does not provide an explicit statement about the release of its source code or a link to a code repository for its methodology.
Open Datasets Yes MS-Celeb-1M (Guo et al. 2016). Megaface (MF2) (Kemelmacher-Shlizerman et al. 2016). LFW (Huang et al. 2007), IJB-A (Klare et al. 2015), and Megaface & Facescrub (Kemelmacher Shlizerman et al. 2016) (Ng and Winkler 2014).
Dataset Splits No The paper mentions training epochs and monitoring CPK values but does not specify explicit training/validation/test splits with percentages or sample counts. It refers to distinct training and testing sets, but not a validation split from the training data.
Hardware Specification Yes On the other hand, current GPUs only come with limited memory capacity, e.g. the memory capacity of Tesla P100 is up to 16 GB. The training is done on a server with 8 NVIDIA Titan X GPUs.
Software Dependencies No The paper mentions software like Hynet and Res Net-101 and the use of SGD with momentum, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes For all settings, the networks are trained using SGD with momentum. The mini-batch sizes are set to 512 and 256 respectively for Hynet and Res Net-101. We will rebuild the hashing forest every T iterations in order to stay updated. We set M to be the minimum number such that the average top-M cumulative probability is above a threshold τcp.