Beyond Dropout: Feature Map Distortion to Regularize Deep Neural Networks
Authors: Yehui Tang, Yunhe Wang, Yixing Xu, Boxin Shi, Chao Xu, Chunjing Xu, Chang Xu5964-5971
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments In this section, we conduct experiments on several benchmark datasets to validate the effectiveness of the proposed feature map distortion method. The method is implemented on both FC layers and convolutional layers, which are validated with conventional CNNs and modern CNNs (e.g. Res Net) respectively. |
| Researcher Affiliation | Collaboration | Yehui Tang,1 Yunhe Wang,2 Yixing Xu,2 Boxin Shi,4,5 Chao Xu,1 Chunjing Xu,2 Chang Xu3 1Key Lab of Machine Perception (MOE), CMIC, School of EECS, Peking University, China 2Huawei Noah s Ark Lab, 3School of Computer Science, Faculty of Engineering, The University of Sydney, Australia |
| Pseudocode | Yes | Algorithm 1 Feature map distortion for training networks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Experiments In this section, we conduct experiments on several benchmark datasets to validate the effectiveness of the proposed feature map distortion method... Dataset. CIFAR-10 and CIFAR-100 dataset both contain 60000 natural images... Image Net dataset contains 1.2M training images and 50000 validation images, consisting of 1000 categories. |
| Dataset Splits | Yes | Dataset. CIFAR-10 and CIFAR-100 dataset both contain 60000 natural images with size 32 32. 50000 images are used for training and 10000 for testing. The images are divided into 10 categories and 100 categories, respectively. 20% of the training data are regarded as validation sets. ... Image Net dataset contains 1.2M training images and 50000 validation images, consisting of 1000 categories. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | The distortion probability (dropping probability for dropout and dropblock) increases linearly from 0 to the appointed distortion probability p following (Ghiasi, Lin, and Le 2018). ... Distortion probability p is selected from {0,4, 0.5, 0.6} and the step length γ is set to 5. The model is trained for 500 epoch with batchsize 128. The learning rate is initialized with 0.01, and decayed by a factor of 10 at 200, 300 and 400 epochs. ... The networks are trained for 200 epochs, batchsize is set to 128 and weight decay is set to 5e-4. The initial learning rate is set to 0.1 and is decayed by a factor of 5 at 60, 120 and 160 epochs. |