Contextual Dropout: An Efficient Sample-Dependent Dropout Module

Authors: XINJIE FAN, Shujian Zhang, Korawat Tanwisuth, Xiaoning Qian, Mingyuan Zhou

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that the proposed method outperforms baseline methods in terms of both accuracy and quality of uncertainty estimation.
Researcher Affiliation Academia 1The University of Texas at Austin, 2Texas A&M University
Pseudocode Yes Algorithm 1: Bernoulli contextual dropout with sequential ARM
Open Source Code No The paper states 'Experiments are conducted using the code of Yu et al. (2019) as basis' but does not provide a statement or link for the open-sourcing of their own contextual dropout implementation.
Open Datasets Yes We apply the proposed method to three representative types of NN layers: fully connected, convolutional, and attention layers with applications on MNIST (Le Cun et al., 2010), CIFAR (Krizhevsky et al., 2009), Image Net (Deng et al., 2009), and VQA-v2 (Goyal et al., 2017).
Dataset Splits Yes For hyperparameter tuning, we hold out 10, 000 samples randomly selected from the training set for validation. We use the chosen hyperparameters to train on the full training set (60, 000 samples) and evaluate on the testing set (10, 000 samples).
Hardware Specification Yes All experiments are conducted using a single Nvidia Tesla V100 GPU.
Software Dependencies No The paper mentions software like Adam optimizer, Nesterov Momentum optimizer, and ReLU/Leaky ReLU activations, but does not specify version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes All models are trained for 200 epochs with batch size 128 and the Adam optimizer (Kingma & Ba, 2014) (β1 = 0.9, β2 = 0.999). The learning rate is 0.001.