Neural Routing by Memory
Authors: Kaipeng Zhang, Zhenqiang Li, Zhifeng Li, Wei Liu, Yoichi Sato
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method improves VGGNet, Res Net, and Efficient Net s accuracies on Tiny Image Net, Image Net, and CIFAR-100 benchmarks with a negligible extra computational cost. |
| Researcher Affiliation | Collaboration | Kaipeng Zhang1,2 Zhenqiang Li1 Zhifeng Li2 Wei Liu2 Yoichi Sato1 1Institute of Industrial Science, The University of Tokyo 2Tencent Data Platform |
| Pseudocode | No | The paper describes the method and training strategy in prose and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | In this paper, we take three image classification benchmarks, Tiny Image Net [24], Image Net 2012 [30], and CIFAR-100 [22] to evaluate our method. |
| Dataset Splits | Yes | Image Net 2012 consists of 1.2M training images and 50,000 validation images for 1000 classes. Tiny Image Net is a subset of Image Net. It consists of 200 classes, and each class has 500 training images and 50 validation images. CIFAR-100 consists of 100 classes, and each class has 500 training images and 100 validation images. |
| Hardware Specification | Yes | We use 8 V100 (32GB memory version) with Py Torch. |
| Software Dependencies | No | We use 8 V100 (32GB memory version) with Py Torch. We use Synchronized Batch Normalization (Sync BN) supported by the Nvidia APEX library. |
| Experiment Setup | Yes | Learning rate. We first follow the warmup strategy [10] to increase the learning rate from 1e-5 to 0.48 in the first five epochs. Then we use the cosine learning rate strategy for the rest of the epochs, and we decrease the learning rate to 1e-5 at the final epoch. The number of PUs. We set N = 8 for the accuracy and cost trade-off in other experiments. Batch Size. Simply, we multiply the original batch size (256) by N/2 though the numbers of data instances among different PUs are imbalanced, which means that we use N = 8 and batch size 1024 in this paper. Momentum in Memory Updating. We set it to 0.9 in our paper. Others. Regarding other training details, we use stochastic gradient descent with momentum 0.9 and weight decay 1e-5 for Res Net and VGGNet. We use the RMSProp optimizer for Efficient Net. For CA modules, we use the reduction ratio of 16. |