reproducibilityindex.ai

Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs

Authors: Cheng Wang, Carolin Lawrence, Mathias Niepert

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explore the behavior of ST-τ in a variety of tasks and settings. First, we show that ST-τ can learn deterministic and probabilistic automata from data. Second, we demonstrate on real-world classiﬁcation tasks that ST-τ learns well calibrated models. Third, we show that ST-τ is competitive in out-of-distribution detection tasks. Fourth, in a reinforcement learning task, we ﬁnd that ST-τ is able to trade off exploration and exploitation behavior better than existing methods. An implementation is available.
Researcher Affiliation	Industry	Cheng Wang* , Carolin Lawrence*, Mathias Niepert NEC Laboratories Europe {cheng.wang,carolin.lawrence,mathias.niepert}@neclab.eu
Pseudocode	Yes	Algorithm 1 Extracting DFAs with Uncertainty Information; Algorithm 2 Extracting PAs with ST-τ
Open Source Code	Yes	An implementation is available.1 1https://github.com/nec-research/st_tau
Open Datasets	Yes	For the ﬁrst task is heartbeat classiﬁcation with 5 classes where we use the MIT-BIH arrhythmia dataset (Goldberger et al., 2000; Moody & Mark, 2001). ... The second task is sentiment analysis where natural language text is given as input and the problem is binary sentiment classiﬁcation. We use the IMDB dataset (Maas et al., 2011)... We consider a time-series forecasting regression task using the individual household electric power consumption dataset.3 3http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+ power+consumption
Dataset Splits	Yes	Dataset BIH IMDB Train 78k 23k Validation 8k 2k Test 21k 25k ... On BIH, we use the training / test split of (Kachuee et al., 2018), however we additionally split off 10% of the training data to use as a validation set. On IMDB, the original dataset consist of 25k training and 25k test samples, we split 2k from training set as validation dataset.
Hardware Specification	No	No specific hardware details (like CPU/GPU models, memory, or cloud instances) were mentioned for running experiments.
Software Dependencies	No	All models contain only a single LSTM layer, are implemented in Tensorﬂow (Abadi et al., 2015), and use the ADAM (Kingma & Ba, 2015) optimizer with initial learning rate 0.001. (TensorFlow version not specified).
Experiment Setup	Yes	Table 4: Overview of the different hyperparameters for the different datasets. ... Hyperparameters IMDB BIH Hidden dim. 256 128 Learning rate 0.001 0.001 Batch size 8 256 Validation rate 1k 1k Maximum validations 20 50 ST-τ # states 2 5 BBB µ 0.0 0.01 BBB ρ -3 -3 VD Prob. 0.1 0.05