Interpreting and Boosting Dropout from a Game-Theoretic View

Authors: Hao Zhang, Sen Li, YinChao Ma, Mingjie Li, Yichen Xie, Quanshi Zhang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove that dropout can suppress the strength of interactions between input variables of deep neural networks (DNNs). The theoretic proof is also verified by various experiments.
Researcher Affiliation Academia Hao Zhang Shanghai Jiao Tong University 1603023-zh@sjtu.edu.cn Sen Li Sun Yat-sen University lisen6@mail2.sysu.edu.cn Yinchao Ma Huazhong University of Science and Technology u201713506@hust.edu.cn Mingjie Li Shanghai Jiao Tong University limingjie0608@sjtu.edu.cn Yichen Xie Shanghai Jiao Tong University xieyichen@sjtu.edu.cn Quanshi Zhang Shanghai Jiao Tong University zqs1022@sjtu.edu.cn
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code No The paper mentions using and referring to third-party codebases (e.g., pytorch-cifar100, suinleelab) but does not state that the authors are releasing their own code for the described methodology.
Open Datasets Yes MNIST (Lecun et al., 1998), Celeb A(Liu et al., 2015), Tiny Image Net (Le & Yang, 2015), CIFAR-10 dataset (Krizhevsky & Hinton, 2009), SST-2 dataset (Socher et al., 2013)
Dataset Splits No The paper does not specify exact train/validation/test split percentages or absolute sample counts for each split. It mentions sampling training data but not a clear splitting methodology for reproduction.
Hardware Specification Yes We trained Alex Net and VGG-11 using the CIFAR-10 dataset on a GPU of Ge Force GTX-1080Ti.
Software Dependencies No The paper mentions 'Py Torch' implicitly through a reference to 'pytorch-cifar100' but does not specify a version number for PyTorch or any other software dependencies with version numbers.
Experiment Setup Yes For each DNN, we put the dropout operation and the interaction loss in the low convolutional layer (before the 3rd/5th convolutional layer of the Alex Net/VGGs) and the high fully-connected layer (before the 2nd fully-connected layer), respectively... when we trained DNNs with dropout, we set the dropout rate as 0.5... In this paper, we set α=0.05... Thus, we set the sampling number as 500 in all other experiments in this paper.