Wavelet Feature Maps Compression for Image-to-Image CNNs
Authors: Shahaf E. Finder, Yair Zohav, Maor Ashkenazi, Eran Treister
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with various tasks that benefit from high-resolution input. By combining WCC with light quantization, we achieve compression rates equivalent to 1-4bit activation quantization with relatively small and much more graceful degradation in performance. Our code is available at https://github.com/BGUComp Sci/Wavelet Compressed Convolution. Section 5 is dedicated to experimental evaluation, presenting results in tables (e.g., Table 1, 2, 3, 4) and figures (e.g., Figure 1, 3, 4, 5) across multiple datasets and tasks, including object detection, semantic segmentation, monocular depth estimation, and super-resolution. |
| Researcher Affiliation | Academia | Shahaf E. Finder , Yair Zohav , Maor Ashkenazi , Eran Treister The Department of Computer Science, Ben-Gurion University [finders,maorash]@post.bgu.ac.il erant@cs.bgu.ac.il |
| Pseudocode | Yes | The workflow is illustrated in Figure 2, and an explicit algorithm appears in Appendix B. Appendix B is titled 'Explicit Algorithm'. |
| Open Source Code | Yes | Our code is available at https://github.com/BGUComp Sci/Wavelet Compressed Convolution. |
| Open Datasets | Yes | We train and evaluate the networks on the MS COCO 2017 [39] object detection dataset. We evaluated our proposed method on the Cityscapes and Pascal VOC datasets. The Cityscapes dataset [12]... The Pascal VOC [19] dataset... We evaluated the results on the KITTI dataset [21]... For this task, we chose the popular EDSR network [38], trained on the DIV2K dataset [1]. |
| Dataset Splits | Yes | The MS COCO 2017 dataset contains 118K training images and 5K validation images. For Cityscapes, 'During training, we used a random crop of size 768 768 and no crop for the validation set.' For Monodepth2, 'The train/validation split is the default selected by Monodepth2 (based on [71]), and we evaluate it on the ground truths provided by the KITTI depth benchmark.' |
| Hardware Specification | Yes | We ran our experiments on NVIDIA 24GB RTX 3090 GPU. |
| Software Dependencies | No | The paper states 'We implemented our code using Py Torch [47], based on Torchvision and public implementations of the chosen networks.' However, it does not provide specific version numbers for PyTorch or Torchvision, which are required for a reproducible description of software dependencies. |
| Experiment Setup | Yes | For object detection, 'We use the Adam W optimizer, with a learning rate of 10 3 when initially applying WCC layers and 10 4 for finetuning. In addition, we apply a learning rate warm-up in the first epoch of training, followed by a cosine learning rate decay. Each compression step is finetuned for 20 to 40 epochs.' Similar detailed settings are provided for semantic segmentation and monocular depth estimation in their respective sections. |