Multi-Task Zipping via Layer-wise Neuron Sharing
Authors: Xiaoxi He, Zimu Zhou, Lothar Thiele
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments We evaluate the performance of MTZ on zipping networks pre-trained for the same task (Sec. 4.1) and different tasks (Sec. 4.2 and Sec. 4.3). We mainly assess the test errors of each task after network zipping and the retraining overhead involved. MTZ is implemented with Tensor Flow.All experiments are conducted on a workstation equipped with Nvidia Titan X (Maxwell) GPU. |
| Researcher Affiliation | Academia | Xiaoxi He ETH Zurich hex@ethz.ch Zimu Zhou ETH Zurich zzhou@tik.ee.ethz.ch Lothar Thiele ETH Zurich thiele@ethz.ch |
| Pseudocode | Yes | Algorithm 1: Multi-task Zipping via Layer-wise Neuron Sharing |
| Open Source Code | No | The paper states 'MTZ is implemented with Tensor Flow.' but does not provide a specific link or explicit statement about the availability of the source code for the proposed methodology. |
| Open Datasets | Yes | We experiment on MNIST dataset with the Le Net-300-100 and Le Net-5 networks [14] to recognize handwritten digits from zero to nine. We explore to merge two VGG-16 networks trained on the Image Net ILSVRC2012 dataset [24] for object classification and the Celab A dataset [16] for facial attribute classification. |
| Dataset Splits | No | The paper discusses 'test errors' and 'retraining', but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or detailed splitting methodology). |
| Hardware Specification | Yes | All experiments are conducted on a workstation equipped with Nvidia Titan X (Maxwell) GPU. |
| Software Dependencies | No | The paper states 'MTZ is implemented with Tensor Flow.' but does not provide a specific version number for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | All the networks are initialized randomly with different seeds, and the training data are also shuffled before every training epoch. After training, the ordering of neurons/kernels in all hidden layers is once more randomly permuted. The training of Le Net-300-100 and Le Net-5 networks requires 1.05 104 and 1.1 104 iterations in average, respectively. |