Interpreting and Boosting Dropout from a Game-Theoretic View
Authors: Hao Zhang, Sen Li, YinChao Ma, Mingjie Li, Yichen Xie, Quanshi Zhang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that dropout can suppress the strength of interactions between input variables of deep neural networks (DNNs). The theoretic proof is also verified by various experiments. |
| Researcher Affiliation | Academia | Hao Zhang Shanghai Jiao Tong University 1603023-zh@sjtu.edu.cn Sen Li Sun Yat-sen University lisen6@mail2.sysu.edu.cn Yinchao Ma Huazhong University of Science and Technology u201713506@hust.edu.cn Mingjie Li Shanghai Jiao Tong University limingjie0608@sjtu.edu.cn Yichen Xie Shanghai Jiao Tong University xieyichen@sjtu.edu.cn Quanshi Zhang Shanghai Jiao Tong University zqs1022@sjtu.edu.cn |
| Pseudocode | No | No pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper mentions using and referring to third-party codebases (e.g., pytorch-cifar100, suinleelab) but does not state that the authors are releasing their own code for the described methodology. |
| Open Datasets | Yes | MNIST (Lecun et al., 1998), Celeb A(Liu et al., 2015), Tiny Image Net (Le & Yang, 2015), CIFAR-10 dataset (Krizhevsky & Hinton, 2009), SST-2 dataset (Socher et al., 2013) |
| Dataset Splits | No | The paper does not specify exact train/validation/test split percentages or absolute sample counts for each split. It mentions sampling training data but not a clear splitting methodology for reproduction. |
| Hardware Specification | Yes | We trained Alex Net and VGG-11 using the CIFAR-10 dataset on a GPU of Ge Force GTX-1080Ti. |
| Software Dependencies | No | The paper mentions 'Py Torch' implicitly through a reference to 'pytorch-cifar100' but does not specify a version number for PyTorch or any other software dependencies with version numbers. |
| Experiment Setup | Yes | For each DNN, we put the dropout operation and the interaction loss in the low convolutional layer (before the 3rd/5th convolutional layer of the Alex Net/VGGs) and the high fully-connected layer (before the 2nd fully-connected layer), respectively... when we trained DNNs with dropout, we set the dropout rate as 0.5... In this paper, we set α=0.05... Thus, we set the sampling number as 500 in all other experiments in this paper. |