T-Basis: a Compact Representation for Neural Networks
Authors: Anton Obukhov, Maxim Rakhuba, Stamatios Georgoulis, Menelaos Kanakis, Dengxin Dai, Luc Van Gool
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed approach on the task of neural network compression and demonstrate that it reaches high compression rates at acceptable performance drops. Finally, we analyze memory and operation requirements of the compressed networks and conclude that T-Basis networks are equally well suited for training and inference in resource-constrained environments and usage on the edge devices. In this section, we investigate the limits of neural network compression with the T-Basis parameterization. Most of the results are shown on the task of image classification, and a smaller part on the semantic image segmentation. |
| Researcher Affiliation | Academia | Anton Obukhov 1 Maxim Rakhuba 1 Stamatios Georgoulis 1 Menelaos Kanakis 1 Dengxin Dai 1 Luc Van Gool 1 2 1ETH Zurich 2KU Leuven. Correspondence to: Anton Obukhov <anton.obukhov@vision.ee.ethz.ch>. |
| Pseudocode | No | The paper describes methods and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project website: obukhov.ai/tbasis. |
| Open Datasets | Yes | We follow the training protocol explained in (Wang et al., 2018): 20 epochs, batch size 128, network architecture with two convolutional layers with 20 and 50 output channels respectively, and two linear layers with 320 and 10 output channels, a total of 429K in uncompressed parameters. Fig. 3 demonstrates performance of the specified Le Net5 architecture (Le Cun et al., 1998) and T-Basis parameterization with various basis sizes and ranks. CIFAR-10 (Krizhevsky & Hinton, 2009)... This dataset consists of 60K RGB images of size 32x32, split into 50K train and 10K test splits. Pascal VOC (Everingham et al., 2010) and SBD (Hariharan et al., 2011) datasets are often used together as a benchmark for semantic segmentation task. They are composed of 10582 photos in the training and 1449 photos in the validation splits and their respective dense semantic annotations with 21 classes. |
| Dataset Splits | Yes | Pascal VOC (Everingham et al., 2010) and SBD (Hariharan et al., 2011) datasets... They are composed of 10582 photos in the training and 1449 photos in the validation splits and their respective dense semantic annotations with 21 classes. |
| Hardware Specification | No | All experiments were implemented in Py Torch and configured to fit entirely into one conventional GPU with 12 or 16GB of memory during training. We thank NVIDIA for GPU donations, and Amazon Activate for EC2 credits. Computations were also done on the Leonhard cluster at ETH Zurich; special thanks to Andreas Lugmayr for making it happen. The specific GPU models, EC2 instance types, or cluster specifications are not provided. |
| Software Dependencies | No | All experiments were implemented in Py Torch and configured to fit entirely into one conventional GPU with 12 or 16GB of memory during training. The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | We follow the training protocol explained in (Wang et al., 2018): 20 epochs, batch size 128. train our experiments until convergence for 500 epochs with batch size 128, initial learning rate 0.1, and 50%-75% step LR schedule with gamma 0.1. For the rest of the learned parameters, we utilize Adam (Kingma & Ba, 2015) optimizer in classification tasks and SGD in semantic segmentation. For Adam, we found that 0.003 is an appropriate learning rate for most of the combinations of hyperparameters. We noticed that larger sizes of basis require lower learning rates in all cases. We perform linear LR warm-up for 2000 steps in all experiments, and gradient clipping with L2 norm at 5.0. We train twice longer than the standard training protocol for 180K steps, original polynomial LR, batch size 16, crops 384 384, without preloading Image Net weights to make comparison with the baseline fair. |