Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs

Authors: Cheng Wang, Carolin Lawrence, Mathias Niepert

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore the behavior of ST-τ in a variety of tasks and settings. First, we show that ST-τ can learn deterministic and probabilistic automata from data. Second, we demonstrate on real-world classification tasks that ST-τ learns well calibrated models. Third, we show that ST-τ is competitive in out-of-distribution detection tasks. Fourth, in a reinforcement learning task, we find that ST-τ is able to trade off exploration and exploitation behavior better than existing methods. An implementation is available.
Researcher Affiliation Industry Cheng Wang* , Carolin Lawrence*, Mathias Niepert NEC Laboratories Europe {cheng.wang,carolin.lawrence,mathias.niepert}@neclab.eu
Pseudocode Yes Algorithm 1 Extracting DFAs with Uncertainty Information; Algorithm 2 Extracting PAs with ST-τ
Open Source Code Yes An implementation is available.1 1https://github.com/nec-research/st_tau
Open Datasets Yes For the first task is heartbeat classification with 5 classes where we use the MIT-BIH arrhythmia dataset (Goldberger et al., 2000; Moody & Mark, 2001). ... The second task is sentiment analysis where natural language text is given as input and the problem is binary sentiment classification. We use the IMDB dataset (Maas et al., 2011)... We consider a time-series forecasting regression task using the individual household electric power consumption dataset.3 3http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+ power+consumption
Dataset Splits Yes Dataset BIH IMDB Train 78k 23k Validation 8k 2k Test 21k 25k ... On BIH, we use the training / test split of (Kachuee et al., 2018), however we additionally split off 10% of the training data to use as a validation set. On IMDB, the original dataset consist of 25k training and 25k test samples, we split 2k from training set as validation dataset.
Hardware Specification No No specific hardware details (like CPU/GPU models, memory, or cloud instances) were mentioned for running experiments.
Software Dependencies No All models contain only a single LSTM layer, are implemented in Tensorflow (Abadi et al., 2015), and use the ADAM (Kingma & Ba, 2015) optimizer with initial learning rate 0.001. (TensorFlow version not specified).
Experiment Setup Yes Table 4: Overview of the different hyperparameters for the different datasets. ... Hyperparameters IMDB BIH Hidden dim. 256 128 Learning rate 0.001 0.001 Batch size 8 256 Validation rate 1k 1k Maximum validations 20 50 ST-τ # states 2 5 BBB µ 0.0 0.01 BBB ρ -3 -3 VD Prob. 0.1 0.05