Contextual Dropout: An Efficient Sample-Dependent Dropout Module
Authors: XINJIE FAN, Shujian Zhang, Korawat Tanwisuth, Xiaoning Qian, Mingyuan Zhou
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation. |
| Researcher Affiliation | Academia | 1The University of Texas at Austin, 2Texas A&M University |
| Pseudocode | Yes | Algorithm 1: Bernoulli contextual dropout with sequential ARM |
| Open Source Code | No | The paper states 'Experiments are conducted using the code of Yu et al. (2019) as basis' but does not provide a statement or link for the open-sourcing of their own contextual dropout implementation. |
| Open Datasets | Yes | We apply the proposed method to three representative types of NN layers: fully connected, convolutional, and attention layers with applications on MNIST (Le Cun et al., 2010), CIFAR (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), and VQA-v2 (Goyal et al., 2017). |
| Dataset Splits | Yes | For hyperparameter tuning, we hold out 10, 000 samples randomly selected from the training set for validation. We use the chosen hyperparameters to train on the full training set (60, 000 samples) and evaluate on the testing set (10, 000 samples). |
| Hardware Specification | Yes | All experiments are conducted using a single Nvidia Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions software like Adam optimizer, Nesterov Momentum optimizer, and ReLU/Leaky ReLU activations, but does not specify version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | All models are trained for 200 epochs with batch size 128 and the Adam optimizer (Kingma & Ba, 2014) (β1 = 0.9, β2 = 0.999). The learning rate is 0.001. |