VanillaNet: the Power of Minimalism in Deep Learning

Authors: Hanting Chen, Yunhe Wang, Jianyuan Guo, Dacheng Tao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimentation demonstrates that Vanilla Net delivers performance on par with renowned deep neural networks and vision transformers, showcasing the power of minimalism in deep learning.
Researcher Affiliation Collaboration Hanting Chen1, Yunhe Wang1 , Jianyuan Guo1, Dacheng Tao2 1 Huawei Noah s Ark Lab. 2 School of Computer Science, University of Sydney.
Pseudocode No The paper describes the deep training strategy and series activation function but does not present them in a structured pseudocode or algorithm block.
Open Source Code Yes Pre-trained models and codes are available at https://github.com/huawei-noah/Vanilla Net and https://gitee.com/mindspore/models/tree/master/research/cv/ vanillanet.
Open Datasets Yes To illustrate the effectiveness of the proposed method, we conduct experiments on the Image Net [8] dataset, which consists of 224 224 pixel RGB color images. The Image Net dataset contains 1.28M training images and 50K validation images with 1000 categories.
Dataset Splits Yes The Image Net dataset contains 1.28M training images and 50K validation images with 1000 categories.
Hardware Specification Yes Latency is tested on Nvidia A100 GPU with batch size of 1.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions like PyTorch, TensorFlow, or scikit-learn).
Experiment Setup Yes Table 8: Image Net-1K training settings. This table provides specific values for parameters such as 'weight init trunc. normal (0.2)', 'optimizer LAMB [58]', 'loss function BCE loss', 'base learning rate 3.5e-3 {5,8-13} /4.8e-3 {6-7}', 'weight decay 0.35/0.35/0.35/0.3/0.3/0.25/0.3/0.3/0.3', 'batch size 1024', 'training epochs 300', 'learning rate schedule cosine decay', 'dropout 0.05', and others.