Wavelet Feature Maps Compression for Image-to-Image CNNs

Authors: Shahaf E. Finder, Yair Zohav, Maor Ashkenazi, Eran Treister

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment with various tasks that benefit from high-resolution input. By combining WCC with light quantization, we achieve compression rates equivalent to 1-4bit activation quantization with relatively small and much more graceful degradation in performance. Our code is available at https://github.com/BGUComp Sci/Wavelet Compressed Convolution. Section 5 is dedicated to experimental evaluation, presenting results in tables (e.g., Table 1, 2, 3, 4) and figures (e.g., Figure 1, 3, 4, 5) across multiple datasets and tasks, including object detection, semantic segmentation, monocular depth estimation, and super-resolution.
Researcher Affiliation Academia Shahaf E. Finder , Yair Zohav , Maor Ashkenazi , Eran Treister The Department of Computer Science, Ben-Gurion University [finders,maorash]@post.bgu.ac.il erant@cs.bgu.ac.il
Pseudocode Yes The workflow is illustrated in Figure 2, and an explicit algorithm appears in Appendix B. Appendix B is titled 'Explicit Algorithm'.
Open Source Code Yes Our code is available at https://github.com/BGUComp Sci/Wavelet Compressed Convolution.
Open Datasets Yes We train and evaluate the networks on the MS COCO 2017 [39] object detection dataset. We evaluated our proposed method on the Cityscapes and Pascal VOC datasets. The Cityscapes dataset [12]... The Pascal VOC [19] dataset... We evaluated the results on the KITTI dataset [21]... For this task, we chose the popular EDSR network [38], trained on the DIV2K dataset [1].
Dataset Splits Yes The MS COCO 2017 dataset contains 118K training images and 5K validation images. For Cityscapes, 'During training, we used a random crop of size 768 768 and no crop for the validation set.' For Monodepth2, 'The train/validation split is the default selected by Monodepth2 (based on [71]), and we evaluate it on the ground truths provided by the KITTI depth benchmark.'
Hardware Specification Yes We ran our experiments on NVIDIA 24GB RTX 3090 GPU.
Software Dependencies No The paper states 'We implemented our code using Py Torch [47], based on Torchvision and public implementations of the chosen networks.' However, it does not provide specific version numbers for PyTorch or Torchvision, which are required for a reproducible description of software dependencies.
Experiment Setup Yes For object detection, 'We use the Adam W optimizer, with a learning rate of 10 3 when initially applying WCC layers and 10 4 for finetuning. In addition, we apply a learning rate warm-up in the first epoch of training, followed by a cosine learning rate decay. Each compression step is finetuned for 20 to 40 epochs.' Similar detailed settings are provided for semantic segmentation and monocular depth estimation in their respective sections.