BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

Authors: Haoping Bai, Meng Cao, Ping Huang, Jiulong Shan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our method on Image Net and achieve SOTA Top-1 accuracy under a low complexity constraint (< 20 MFLOPs). 5 Experimental Analysis and Results
Researcher Affiliation Industry Haoping Bai Meng Cao Ping Huang Jiulong Shan {haoping_bai, mengcao, huang_ping, jiulong_shan}@apple.com
Pseudocode No The paper describes methods and formulas but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code and models will be made publicly available at https://github.com/bhpfelix/QFA.
Open Datasets Yes We demonstrate the effectiveness of our method on Image Net and achieve SOTA Top-1 accuracy under a low complexity constraint (< 20 MFLOPs). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248 255, 2009.
Dataset Splits Yes After training is complete, we randomly sample 16k quantized subnets and evaluate on 10k validation images sampled from the training set to train the accuracy predictor.
Hardware Specification Yes We train with a batch size of 2048 across 32 V100 GPUs on our internal cluster.
Software Dependencies No The paper mentions basing its codebase on an open-source implementation but does not provide specific version numbers for software dependencies like PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes For both stages of the elastic quantization procedure, we follow the common hyperparameter choice of [22] and use an initial learning rate of 0.08. For all experiments, we clip the global norm of the gradient at 500. We train with a batch size of 2048 across 32 V100 GPUs on our internal cluster. ... During the evolutionary search, we keep a population size of 500 for 1000 generations. For each generation, once we identify the Pareto population based on nondominated sorting and crowding distance, we breed new genotypes through crossover and mutation with a crossover probability of 0.007 and a mutation probability of 0.02.