Evidential Turing Processes
Authors: Melih Kandemir, Abdullah Akgül, Manuel Haussmann, Gozde Unal
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We observe our method on five classification tasks to be the only one that can excel all three aspects of total calibration with a single standalone predictor. We observe on five real-world classification tasks that the Evidential Turing Process is the only model that excels simultaneously at model fit, class overlap quantification, and out-of-domain detection. We benchmark ETP against the state of the art as addressed in the ablation plan in Table 1 according to the total calibration criteria developed in Sec. 2 on five real-world data sets. |
| Researcher Affiliation | Academia | Melih Kandemir Dept of Math and Computer Science University of Southern Denmark Odense, Denmark kandemir@imada.sdu.dk Abdullah Akgül Department of Computer Engineering Istanbul Technical University Istanbul, Turkey akgula15@itu.edu.tr Manuel Haussmann Department of Computer Science Aalto University Espoo, Finland manuel.haussmann@aalto.fi Gozde Unal Department of Computer Engineering Istanbul Technical University Istanbul, Turkey gozde.unal@itu.edu.tr |
| Pseudocode | Yes | Figure 2: The Evidential Turing Process: (right) The ETP training routine. while Model not converged do for Dbatch D do Choose DC Dbatch M EZ p M(Z) [r(Z, DC)] λ λ λFET P (λ) end end |
| Open Source Code | Yes | We provide a reference implementation of the proposed model and the experimental pipeline.2 (Footnote 2: https://github.com/ituvisionlab/Evidential Turing Process) |
| Open Datasets | Yes | We observe on five real-world classification tasks that the Evidential Turing Process is the only model that excels simultaneously at model fit, class overlap quantification, and out-of-domain detection. FMNIST. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the MNIST (Le Cun et al., 2010) data set. CIFAR10. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the SVHN (Netzer et al., 2011). SVHN. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the CIFAR10 data set. IMDB Sentiment Classification. We use a LSTM architecture with 64 embedding dimensions and 2 layers with 256 hidden dimensions. |
| Dataset Splits | No | The paper mentions 'test split' for evaluation and 'context set' for their process, but does not explicitly provide percentages or counts for distinct training, validation, and test dataset splits needed for reproducibility. It also mentions 'data augmentation' which is usually done for training data. |
| Hardware Specification | Yes | All experiments are implemented in Py Torch (Paszke et al., 2019) version 1.7.1 and trained on a TITAN RTX. |
| Software Dependencies | Yes | All experiments are implemented in Py Torch (Paszke et al., 2019) version 1.7.1 and trained on a TITAN RTX. It is optimized for 400 epochs with the Adam optimizer (Kingma & Ba, 2015) using the Py Torch (Paszke et al., 2019) default parameters and a learning rate of 0.001. |
| Experiment Setup | Yes | We train each model for 50 epochs with the Adam optimizer (Kingma & Ba, 2015) using a learning rate of 0.001. (for FMNIST) We train each model for 100 epochs using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0001. (for CIFAR10/SVHN) We train each model for 20 epochs using the SGD optimizer (Robbins & Monro, 1951) with a learning rate of 0.05 and 0.9 momentum. (for IMDB) |