Bit-Pruning: A Sparse Multiplication-Less Dot-Product

Authors: Yusuke Sekikawa, Shingo Yashima

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In extensive experiments, we demonstrate that sparse mult-less networks trained with Bit-Pruning show a better accuracy-energy trade-off than sparse mult networks trained with Weight-Pruning.
Researcher Affiliation Industry Yusuke Sekikawa, Shingo Yashima DENSO IT LABORATORY, INC., Tokyo, Japan {sekikawa.yusuke, yashima.shingo}@core.d-itlab.co.jp
Pseudocode No The paper describes procedures in text and uses figures to illustrate concepts (e.g., Figure 1, Figure 2) but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes (Code is available at https://github.com/Denso ITLab/bitprune)
Open Datasets Yes We evaluated the accuracy/energy trade-off of both methods on CIFAR-10 (Krizhevsky et al., a), CIFAR-100 (Krizhevsky et al., b), and Image Net (Deng et al., 2009).
Dataset Splits No The paper mentions training epochs and batch sizes for CIFAR-10, CIFAR-100, and Image Net datasets in Table 3, and refers to 'test accuracy' in the results, but does not explicitly state the training, validation, and test splits (e.g., as percentages or sample counts) for these datasets.
Hardware Specification No The paper includes Table 1 which lists energy and area costs for ASIC and FPGA technologies for different operations, but this refers to the target hardware for deployment/analysis, not the specific hardware (e.g., GPU/CPU models, memory) used to conduct the experiments described in the paper.
Software Dependencies No Table 3 mentions 'Optimizer Adam W (Loshchilov & Hutter, 2019)' and 'LSQ Esser et al. (2020)', and the paper cites 'PyTorch (Paszke et al., 2019)', but specific version numbers for PyTorch or other libraries used in the experimental setup are not provided.
Experiment Setup Yes Table 3: Experimental setup CIFAR-10 CIFAR-100 Image Net Network Res Net18 Res Net18 Conv Ne Xt-B Batch size 512 256 Training epochs 200 100 Optimizer Adam W (Loshchilov & Hutter, 2019) Scheduler One Cycle (Smith & Topin, 2019) Cosine decay (Loshchilov & Hutter, 2017) Weight quantization M 8 Activation quantization N 4/8/32 8 Weight initialization Kaiming-uniform (He et al., 2015) Pretrained4 λ(l) for Cmove (Bit-Pruning only) 1.0 for all layers (Figure 3c)