reproducibilityindex.ai

Neural Routing by Memory

Authors: Kaipeng Zhang, Zhenqiang Li, Zhifeng Li, Wei Liu, Yoichi Sato

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our method improves VGGNet, Res Net, and Efficient Net s accuracies on Tiny Image Net, Image Net, and CIFAR-100 benchmarks with a negligible extra computational cost.
Researcher Affiliation	Collaboration	Kaipeng Zhang1,2 Zhenqiang Li1 Zhifeng Li2 Wei Liu2 Yoichi Sato1 1Institute of Industrial Science, The University of Tokyo 2Tencent Data Platform
Pseudocode	No	The paper describes the method and training strategy in prose and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	In this paper, we take three image classification benchmarks, Tiny Image Net [24], Image Net 2012 [30], and CIFAR-100 [22] to evaluate our method.
Dataset Splits	Yes	Image Net 2012 consists of 1.2M training images and 50,000 validation images for 1000 classes. Tiny Image Net is a subset of Image Net. It consists of 200 classes, and each class has 500 training images and 50 validation images. CIFAR-100 consists of 100 classes, and each class has 500 training images and 100 validation images.
Hardware Specification	Yes	We use 8 V100 (32GB memory version) with Py Torch.
Software Dependencies	No	We use 8 V100 (32GB memory version) with Py Torch. We use Synchronized Batch Normalization (Sync BN) supported by the Nvidia APEX library.
Experiment Setup	Yes	Learning rate. We first follow the warmup strategy [10] to increase the learning rate from 1e-5 to 0.48 in the first five epochs. Then we use the cosine learning rate strategy for the rest of the epochs, and we decrease the learning rate to 1e-5 at the final epoch. The number of PUs. We set N = 8 for the accuracy and cost trade-off in other experiments. Batch Size. Simply, we multiply the original batch size (256) by N/2 though the numbers of data instances among different PUs are imbalanced, which means that we use N = 8 and batch size 1024 in this paper. Momentum in Memory Updating. We set it to 0.9 in our paper. Others. Regarding other training details, we use stochastic gradient descent with momentum 0.9 and weight decay 1e-5 for Res Net and VGGNet. We use the RMSProp optimizer for Efficient Net. For CA modules, we use the reduction ratio of 16.