BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer
Authors: Haoping Bai, Meng Cao, Ping Huang, Jiulong Shan
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our method on Image Net and achieve SOTA Top-1 accuracy under a low complexity constraint (< 20 MFLOPs). 5 Experimental Analysis and Results |
| Researcher Affiliation | Industry | Haoping Bai Meng Cao Ping Huang Jiulong Shan {haoping_bai, mengcao, huang_ping, jiulong_shan}@apple.com |
| Pseudocode | No | The paper describes methods and formulas but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models will be made publicly available at https://github.com/bhpfelix/QFA. |
| Open Datasets | Yes | We demonstrate the effectiveness of our method on Image Net and achieve SOTA Top-1 accuracy under a low complexity constraint (< 20 MFLOPs). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248 255, 2009. |
| Dataset Splits | Yes | After training is complete, we randomly sample 16k quantized subnets and evaluate on 10k validation images sampled from the training set to train the accuracy predictor. |
| Hardware Specification | Yes | We train with a batch size of 2048 across 32 V100 GPUs on our internal cluster. |
| Software Dependencies | No | The paper mentions basing its codebase on an open-source implementation but does not provide specific version numbers for software dependencies like PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | For both stages of the elastic quantization procedure, we follow the common hyperparameter choice of [22] and use an initial learning rate of 0.08. For all experiments, we clip the global norm of the gradient at 500. We train with a batch size of 2048 across 32 V100 GPUs on our internal cluster. ... During the evolutionary search, we keep a population size of 500 for 1000 generations. For each generation, once we identify the Pareto population based on nondominated sorting and crowding distance, we breed new genotypes through crossover and mutation with a crossover probability of 0.007 and a mutation probability of 0.02. |