Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

Authors: Hyeonwoo Noh, Tackgeun You, Jonghwan Mun, Bohyung Han

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed training algorithm in various architectures for real world tasks including object recognition [40], visual question answering [39], image captioning [35] and action recognition [8]. These models are chosen for our experiments since they use dropouts actively for regularization. To isolate the effect of the proposed training method, we employ simple models without integrating heuristics for performance improvement (e.g., model ensembles, multi-scaling, etc.) and make hyper-parameters (e.g., type of optimizer, learning rate, batch size, etc.) fixed.
Researcher Affiliation Academia Hyeonwoo Noh Tackgeun You Jonghwan Mun Bohyung Han Dept. of Computer Science and Engineering, POSTECH, Korea {shgusdngogo,tackgeun.you,choco1916,bhhan}@postech.ac.kr
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using a publicly available implementation of Wide ResNet for experiments ("https://github.com/szagoruyko/wide-residual-networks") and other third-party codes, but does not provide its own code or explicitly state that the code for their proposed method is open source.
Open Datasets Yes We evaluate the proposed training algorithm in various architectures for real world tasks including object recognition [40], visual question answering [39], image captioning [35] and action recognition [8]... evaluated on CIFAR datasets [19]... We use VQA dataset [2], which is commonly used for the evaluation of VQA algorithms... We use MSCOCO dataset for experiment... We employ a well-known benchmark of action classification, UCF-101 [33], for evaluation...
Dataset Splits Yes The dataset has three splits for cross validation, and the final performance is calculated by the average accuracy of the three splits. (UCF-101)
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running experiments, only general statements about hyper-parameters being 'fixed'.
Software Dependencies No The paper mentions models and architectures like 'two-layer LSTM' or 'VGG-16' but does not list specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, specific CUDA versions).
Experiment Setup Yes We perform experiments using the wide residual network with widening factor 10 and depth 28... Wide Res Net (depth=28, dropout=0.3) [40]... Wide Res Net (depth=28, dropout=0.5)... with IWSGD (S = 4)... with IWSGD (S = 8)... When we evaluate performance of IWSGD with 5 and 8 samples...