Bit-Pruning: A Sparse Multiplication-Less Dot-Product
Authors: Yusuke Sekikawa, Shingo Yashima
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In extensive experiments, we demonstrate that sparse mult-less networks trained with Bit-Pruning show a better accuracy-energy trade-off than sparse mult networks trained with Weight-Pruning. |
| Researcher Affiliation | Industry | Yusuke Sekikawa, Shingo Yashima DENSO IT LABORATORY, INC., Tokyo, Japan {sekikawa.yusuke, yashima.shingo}@core.d-itlab.co.jp |
| Pseudocode | No | The paper describes procedures in text and uses figures to illustrate concepts (e.g., Figure 1, Figure 2) but does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | (Code is available at https://github.com/Denso ITLab/bitprune) |
| Open Datasets | Yes | We evaluated the accuracy/energy trade-off of both methods on CIFAR-10 (Krizhevsky et al., a), CIFAR-100 (Krizhevsky et al., b), and Image Net (Deng et al., 2009). |
| Dataset Splits | No | The paper mentions training epochs and batch sizes for CIFAR-10, CIFAR-100, and Image Net datasets in Table 3, and refers to 'test accuracy' in the results, but does not explicitly state the training, validation, and test splits (e.g., as percentages or sample counts) for these datasets. |
| Hardware Specification | No | The paper includes Table 1 which lists energy and area costs for ASIC and FPGA technologies for different operations, but this refers to the target hardware for deployment/analysis, not the specific hardware (e.g., GPU/CPU models, memory) used to conduct the experiments described in the paper. |
| Software Dependencies | No | Table 3 mentions 'Optimizer Adam W (Loshchilov & Hutter, 2019)' and 'LSQ Esser et al. (2020)', and the paper cites 'PyTorch (Paszke et al., 2019)', but specific version numbers for PyTorch or other libraries used in the experimental setup are not provided. |
| Experiment Setup | Yes | Table 3: Experimental setup CIFAR-10 CIFAR-100 Image Net Network Res Net18 Res Net18 Conv Ne Xt-B Batch size 512 256 Training epochs 200 100 Optimizer Adam W (Loshchilov & Hutter, 2019) Scheduler One Cycle (Smith & Topin, 2019) Cosine decay (Loshchilov & Hutter, 2017) Weight quantization M 8 Activation quantization N 4/8/32 8 Weight initialization Kaiming-uniform (He et al., 2015) Pretrained4 λ(l) for Cmove (Bit-Pruning only) 1.0 for all layers (Figure 3c) |