VanillaNet: the Power of Minimalism in Deep Learning
Authors: Hanting Chen, Yunhe Wang, Jianyuan Guo, Dacheng Tao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimentation demonstrates that Vanilla Net delivers performance on par with renowned deep neural networks and vision transformers, showcasing the power of minimalism in deep learning. |
| Researcher Affiliation | Collaboration | Hanting Chen1, Yunhe Wang1 , Jianyuan Guo1, Dacheng Tao2 1 Huawei Noah s Ark Lab. 2 School of Computer Science, University of Sydney. |
| Pseudocode | No | The paper describes the deep training strategy and series activation function but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | Pre-trained models and codes are available at https://github.com/huawei-noah/Vanilla Net and https://gitee.com/mindspore/models/tree/master/research/cv/ vanillanet. |
| Open Datasets | Yes | To illustrate the effectiveness of the proposed method, we conduct experiments on the Image Net [8] dataset, which consists of 224 224 pixel RGB color images. The Image Net dataset contains 1.28M training images and 50K validation images with 1000 categories. |
| Dataset Splits | Yes | The Image Net dataset contains 1.28M training images and 50K validation images with 1000 categories. |
| Hardware Specification | Yes | Latency is tested on Nvidia A100 GPU with batch size of 1. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming language versions, library versions like PyTorch, TensorFlow, or scikit-learn). |
| Experiment Setup | Yes | Table 8: Image Net-1K training settings. This table provides specific values for parameters such as 'weight init trunc. normal (0.2)', 'optimizer LAMB [58]', 'loss function BCE loss', 'base learning rate 3.5e-3 {5,8-13} /4.8e-3 {6-7}', 'weight decay 0.35/0.35/0.35/0.3/0.3/0.25/0.3/0.3/0.3', 'batch size 1024', 'training epochs 300', 'learning rate schedule cosine decay', 'dropout 0.05', and others. |