Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs
Authors: Cheng Wang, Carolin Lawrence, Mathias Niepert
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore the behavior of ST-τ in a variety of tasks and settings. First, we show that ST-τ can learn deterministic and probabilistic automata from data. Second, we demonstrate on real-world classification tasks that ST-τ learns well calibrated models. Third, we show that ST-τ is competitive in out-of-distribution detection tasks. Fourth, in a reinforcement learning task, we find that ST-τ is able to trade off exploration and exploitation behavior better than existing methods. An implementation is available. |
| Researcher Affiliation | Industry | Cheng Wang* , Carolin Lawrence*, Mathias Niepert NEC Laboratories Europe {cheng.wang,carolin.lawrence,mathias.niepert}@neclab.eu |
| Pseudocode | Yes | Algorithm 1 Extracting DFAs with Uncertainty Information; Algorithm 2 Extracting PAs with ST-τ |
| Open Source Code | Yes | An implementation is available.1 1https://github.com/nec-research/st_tau |
| Open Datasets | Yes | For the first task is heartbeat classification with 5 classes where we use the MIT-BIH arrhythmia dataset (Goldberger et al., 2000; Moody & Mark, 2001). ... The second task is sentiment analysis where natural language text is given as input and the problem is binary sentiment classification. We use the IMDB dataset (Maas et al., 2011)... We consider a time-series forecasting regression task using the individual household electric power consumption dataset.3 3http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+ power+consumption |
| Dataset Splits | Yes | Dataset BIH IMDB Train 78k 23k Validation 8k 2k Test 21k 25k ... On BIH, we use the training / test split of (Kachuee et al., 2018), however we additionally split off 10% of the training data to use as a validation set. On IMDB, the original dataset consist of 25k training and 25k test samples, we split 2k from training set as validation dataset. |
| Hardware Specification | No | No specific hardware details (like CPU/GPU models, memory, or cloud instances) were mentioned for running experiments. |
| Software Dependencies | No | All models contain only a single LSTM layer, are implemented in Tensorflow (Abadi et al., 2015), and use the ADAM (Kingma & Ba, 2015) optimizer with initial learning rate 0.001. (TensorFlow version not specified). |
| Experiment Setup | Yes | Table 4: Overview of the different hyperparameters for the different datasets. ... Hyperparameters IMDB BIH Hidden dim. 256 128 Learning rate 0.001 0.001 Batch size 8 256 Validation rate 1k 1k Maximum validations 20 50 ST-τ # states 2 5 BBB µ 0.0 0.01 BBB ρ -3 -3 VD Prob. 0.1 0.05 |