Quantifying Uncertainties in Natural Language Processing Tasks
Authors: Yijun Xiao, William Yang Wang7322-7329
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With empirical experiments on sentiment analysis, named entity recognition, and language modeling using convolutional and recurrent neural network models, we show that explicitly modeling uncertainties is not only necessary to measure output confidence levels, but also useful at enhancing model performances in various NLP tasks. |
| Researcher Affiliation | Academia | Yijun Xiao, William Yang Wang University of California, Santa Barbara {yijunxiao,william}@cs.ucsb.edu |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. The paper presents mathematical formulations but not procedural algorithms. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., a specific repository link, an explicit statement of code release) for the methodology's source code. |
| Open Datasets | Yes | We use four large scale datasets containing document reviews as in (Tang, Qin, and Liu 2015). Specifically, we use IMDB movie review data (Diao et al. 2014) and Yelp restaurant review datasets from Yelp Dataset Challenge in 2013, 2014 and 2015. |
| Dataset Splits | Yes | Data splits are the same as in (Tang, Qin, and Liu 2015; Diao et al. 2014). Model with best performance on the validation set is chosen to be evaluated on the test set. We use the standard Penn Treebank (PTB), a standard benchmark in the field. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like 'Adam' and models like 'CNN' and 'LSTM', but it does not specify version numbers for any libraries, frameworks, or programming languages (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | Experiment Setup for Sentiment Analysis: "embedding size is set to 300; three different kernel sizes are used in all models and they are chosen from [(1,2,3), (2,3,4), (3,4,5)]; number of feature maps for each kernel is 100; dropout (Srivastava et al. 2014) is applied between layers and dropout rate is 0.5. To evaluate model uncertainty and input uncertainty, 10 samples are drawn from the approximated posterior to estimate the output mean and variance. Adam (Kingma and Ba 2014) is adopted in all experiments with learning rate chosen from [3e-4, 1e-3, 3e-3] and weight decay from [3e-5, 1e-4, 3e-4]. Batch size is set to 32 and training runs for 48 epochs with 2,000 iterations per epoch for Yelp 2013 and IMDB, and 5,000 iterations per epoch for Yelp 2014 and 2015." Experiment Setup for Named Entity Recognition: "Word embedding size is 200 and hidden size in each direction is 200; dropout probability is fixed at 0.5; other hyperparameters related to quantifying uncertainties are the same with previous experiment setups. For training, we use Adam optimizer (Kingma and Ba 2014). Learn rate is selected from [3e-4, 1e-3, 3e-4] and weight decay is chosen from [0, 1e-5, 1e-4]. Training runs for 100 epochs with each epoch consisting of 2,000 randomly sampled mini-batches. Batch size is 32." Experiment Setting for Language Modeling: "The model is a two-layer LSTM with hidden size 650. Dropout rate is fixed at 0.5. Number of samples for MC dropout is set to 50." |