Evidential Turing Processes

Authors: Melih Kandemir, Abdullah Akgül, Manuel Haussmann, Gozde Unal

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We observe our method on five classification tasks to be the only one that can excel all three aspects of total calibration with a single standalone predictor. We observe on five real-world classification tasks that the Evidential Turing Process is the only model that excels simultaneously at model fit, class overlap quantification, and out-of-domain detection. We benchmark ETP against the state of the art as addressed in the ablation plan in Table 1 according to the total calibration criteria developed in Sec. 2 on five real-world data sets.
Researcher Affiliation Academia Melih Kandemir Dept of Math and Computer Science University of Southern Denmark Odense, Denmark kandemir@imada.sdu.dk Abdullah Akgül Department of Computer Engineering Istanbul Technical University Istanbul, Turkey akgula15@itu.edu.tr Manuel Haussmann Department of Computer Science Aalto University Espoo, Finland manuel.haussmann@aalto.fi Gozde Unal Department of Computer Engineering Istanbul Technical University Istanbul, Turkey gozde.unal@itu.edu.tr
Pseudocode Yes Figure 2: The Evidential Turing Process: (right) The ETP training routine. while Model not converged do for Dbatch D do Choose DC Dbatch M EZ p M(Z) [r(Z, DC)] λ λ λFET P (λ) end end
Open Source Code Yes We provide a reference implementation of the proposed model and the experimental pipeline.2 (Footnote 2: https://github.com/ituvisionlab/Evidential Turing Process)
Open Datasets Yes We observe on five real-world classification tasks that the Evidential Turing Process is the only model that excels simultaneously at model fit, class overlap quantification, and out-of-domain detection. FMNIST. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the MNIST (Le Cun et al., 2010) data set. CIFAR10. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the SVHN (Netzer et al., 2011). SVHN. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the CIFAR10 data set. IMDB Sentiment Classification. We use a LSTM architecture with 64 embedding dimensions and 2 layers with 256 hidden dimensions.
Dataset Splits No The paper mentions 'test split' for evaluation and 'context set' for their process, but does not explicitly provide percentages or counts for distinct training, validation, and test dataset splits needed for reproducibility. It also mentions 'data augmentation' which is usually done for training data.
Hardware Specification Yes All experiments are implemented in Py Torch (Paszke et al., 2019) version 1.7.1 and trained on a TITAN RTX.
Software Dependencies Yes All experiments are implemented in Py Torch (Paszke et al., 2019) version 1.7.1 and trained on a TITAN RTX. It is optimized for 400 epochs with the Adam optimizer (Kingma & Ba, 2015) using the Py Torch (Paszke et al., 2019) default parameters and a learning rate of 0.001.
Experiment Setup Yes We train each model for 50 epochs with the Adam optimizer (Kingma & Ba, 2015) using a learning rate of 0.001. (for FMNIST) We train each model for 100 epochs using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0001. (for CIFAR10/SVHN) We train each model for 20 epochs using the SGD optimizer (Robbins & Monro, 1951) with a learning rate of 0.05 and 0.9 momentum. (for IMDB)