Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers
Authors: Yiwei Lu, Yaoliang Yu, Xinlin Li, Vahid Partovi Nia
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | conduct image classification experiments on CNNs and vision transformers, and empirically verify that BNN++ generally achieves competitive results on binarizing these models. |
| Researcher Affiliation | Collaboration | Yiwei Lu School of Computer Science University of Waterloo Vector Institute yiwei.lu@uwaterloo.ca Yaoliang Yu School of Computer Science University of Waterloo Vector Institute yaoliang.yu@uwaterloo.ca Xinlin Li Huawei Noah s Ark Lab xinlin.li1@huawei.com Vahid Partovi Nia Huawei Noah s Ark Lab vahid.partovinia@huawei.com |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. Figure 1 shows a visualization of forward and backward passes, but it is not pseudocode. |
| Open Source Code | No | The paper does not include an explicit statement or link providing access to the open-source code for their proposed methodology. |
| Open Datasets | Yes | Datasets: We perform image classification on CIFAR-10/100 datasets [27] and Image Net-1K dataset [28]. |
| Dataset Splits | No | The paper uses standard datasets (CIFAR-10/100, Image Net-1K) which implies standard splits, but it does not explicitly state the training/validation/test dataset splits (e.g., percentages or sample counts) within the paper. |
| Hardware Specification | Yes | Hardware and package: All experiments were run on a GPU cluster with NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions 'The platform we use is Py Torch. Specifically, we apply Vi T and Dei T models implemented in Pytorch Image Models (timm)', but it does not specify version numbers for PyTorch or timm. |
| Experiment Setup | Yes | Hyperparameters: We apply the same training hyperparameters and fine-tune/end-to-end training for 100/300 epochs across all models. For binarization methods: (1) PQ (Prox Quant): similar to Bai et al. [2], we apply the Linear Quantizer (LQ), see (10) in Appendix A, with initial ρ0 = 0.01 and linearly increase to ρT = 10; (2) r PC (reverse Prox Connect): we use the same LQ for r PC; (3) Prox Connect++: for PC, we apply the same LQ; for BNN+, we choose µ = 5 (no need to increase µ as the forward quantizer is sign); for BNN++, we choose µ0 = 5 and linearly increase to µT = 30 to achieve binarization at the final step. |