Neural Network Approximation based on Hausdorff distance of Tropical Zonotopes

Authors: Panagiotis Misiakos, Georgios Smyrnis, George Retsinas, Petros Maragos

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we theoretically contribute to neural network approximation by providing a novel tropical geometrical viewpoint to structured neural network compression. In particular, we show that the approximation error between two neural networks with Re LU activations and one hidden layer depends on the Hausdorff distance of the tropical zonotopes of the networks. This theorem comes as a first step towards a purely geometrical interpretation of neural network approximation. Based on this theoretical contribution, we propose geometrical methods that employ the K-means algorithm to compress the fully connected parts of Re LU activated deep neural networks. We analyze the error bounds of our algorithms theoretically based on our approximation theorem and evaluate them empirically on neural network compression. Our experiments follow a proof-of-concept strategy and indicate that our geometrical tools achieve improved performance over relevant tropical geometry techniques and can be competitive against non-tropical methods.
Researcher Affiliation Academia 1ETH Zurich, 2University of Texas at Austin, 3National Technical University of Athens
Pseudocode Yes Algorithm 1: Zonotope K-means Compression; Algorithm 2: Neural Path K-means Compression
Open Source Code No The paper does not provide any explicit statement about open-source code availability or a link to a code repository for the described methodology.
Open Datasets Yes MNIST Dataset, Pairs 3-5 and 4-9 The first experiment is performed on the binary classification tasks of pairs 3/5 and 4/9 of the MNIST dataset and so we can utilize both of our proposed methods. In Table 1, we compare our methods with a tropical geometrical approach of Smyrnis et al. (2020). Their method is based on a tropical division framework for network minimization. For fair comparison, we use the same CNN with two fully connected layers and hidden layer of size 1000. According to Table 1, our techniques seem to have similar performance. They retain the accuracy of the network while reducing its size. Moreover, in Table 2 we include experimental computation of the theoretical bounds provided by Proposition 5, 6. We notice that the bounds decrease as the remaining weights get less. The behaviour of the bounds was expected to be incremental because the less weights we use, the compression gets worse and the error becomes larger. However, the opposite holds which means that the bounds are tighter for higher pruning rates. It is also important to mention that the bounds become 0 when we keep all the weights, as expected. MNIST and Fashion-MNIST Datasets For the second experiment we employ MNIST and Fashion-MNIST datasets. The corresponding classification is multiclass and thus Neural Path K-means may only be applied. In Table 3, we compare it with the multiclass tropical method of Smyrnis & Maragos (2020) using the same CNN architecture they do. Furthermore, in plots 4a, 4b we compare Neural Path K-means with Thi Net and baseline pruning methods by compressing Le Net5 (Le Cun et al., 1998). To get a better idea of how our method performs in deeper architectures we provide plots 4c,4d that illustrate the performance of compressing a deep neural network with layers of size 28 28, 512, 256, 128 and 10, which we refer to as deep NN. The compression is executed on all hidden layers beginning from the input and heading to the output. From Table 3, we deduce that our method performs better than (Smyrnis & Maragos, 2020). Also, it achieves higher accuracy scores and experience lower variance as shown in plots 4a-4d. Neural Path K-means, overall, seems to have good performance, even competitive to Thi Net. Its worst performance occurs on low percentages of remaining weights. An explanation for this is that K-means provides a high-quality compression as long as the number of centers is not less than the number of 'real' clusters. CIFAR Dataset We conduct our final experiment on CIFAR datasets using CIFAR-VGG (Blalock et al., 2020) and an altered version of Alex Net adapted for CIFAR. The resulting plots are shown in Fig. 4e-4h. We deduce that Neural Path K-means retains a good performance on larger datasets. In particular, in most cases it has slightly better accuracy an lower deviation than the baselines, but has worse behaviour when keeping almost zero weights.
Dataset Splits No The paper does not explicitly provide details about training/validation/test splits, such as specific percentages, sample counts, or references to predefined splits for reproduction purposes.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models.
Software Dependencies No The paper does not provide a reproducible description of ancillary software, as it does not list specific version numbers for key software components or specialized packages used in the experiments.
Experiment Setup No The paper describes the general approach and network architectures (e.g., 'CNN with two fully connected layers and hidden layer of size 1000', 'Le Net5', 'deep NN with layers of size 28 28, 512, 256, 128 and 10'), but it does not provide specific experimental setup details such as hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or other system-level training configurations.