Quantifying Uncertainties in Natural Language Processing Tasks

Authors: Yijun Xiao, William Yang Wang7322-7329

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With empirical experiments on sentiment analysis, named entity recognition, and language modeling using convolutional and recurrent neural network models, we show that explicitly modeling uncertainties is not only necessary to measure output confidence levels, but also useful at enhancing model performances in various NLP tasks.
Researcher Affiliation Academia Yijun Xiao, William Yang Wang University of California, Santa Barbara {yijunxiao,william}@cs.ucsb.edu
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper. The paper presents mathematical formulations but not procedural algorithms.
Open Source Code No The paper does not provide any concrete access information (e.g., a specific repository link, an explicit statement of code release) for the methodology's source code.
Open Datasets Yes We use four large scale datasets containing document reviews as in (Tang, Qin, and Liu 2015). Specifically, we use IMDB movie review data (Diao et al. 2014) and Yelp restaurant review datasets from Yelp Dataset Challenge in 2013, 2014 and 2015.
Dataset Splits Yes Data splits are the same as in (Tang, Qin, and Liu 2015; Diao et al. 2014). Model with best performance on the validation set is chosen to be evaluated on the test set. We use the standard Penn Treebank (PTB), a standard benchmark in the field.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software components like 'Adam' and models like 'CNN' and 'LSTM', but it does not specify version numbers for any libraries, frameworks, or programming languages (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Experiment Setup for Sentiment Analysis: "embedding size is set to 300; three different kernel sizes are used in all models and they are chosen from [(1,2,3), (2,3,4), (3,4,5)]; number of feature maps for each kernel is 100; dropout (Srivastava et al. 2014) is applied between layers and dropout rate is 0.5. To evaluate model uncertainty and input uncertainty, 10 samples are drawn from the approximated posterior to estimate the output mean and variance. Adam (Kingma and Ba 2014) is adopted in all experiments with learning rate chosen from [3e-4, 1e-3, 3e-3] and weight decay from [3e-5, 1e-4, 3e-4]. Batch size is set to 32 and training runs for 48 epochs with 2,000 iterations per epoch for Yelp 2013 and IMDB, and 5,000 iterations per epoch for Yelp 2014 and 2015." Experiment Setup for Named Entity Recognition: "Word embedding size is 200 and hidden size in each direction is 200; dropout probability is fixed at 0.5; other hyperparameters related to quantifying uncertainties are the same with previous experiment setups. For training, we use Adam optimizer (Kingma and Ba 2014). Learn rate is selected from [3e-4, 1e-3, 3e-4] and weight decay is chosen from [0, 1e-5, 1e-4]. Training runs for 100 epochs with each epoch consisting of 2,000 randomly sampled mini-batches. Batch size is 32." Experiment Setting for Language Modeling: "The model is a two-layer LSTM with hidden size 650. Dropout rate is fixed at 0.5. Number of samples for MC dropout is set to 50."