reproducibilityindex.ai

Evidential Turing Processes

Authors: Melih Kandemir, Abdullah Akgül, Manuel Haussmann, Gozde Unal

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We observe our method on ﬁve classiﬁcation tasks to be the only one that can excel all three aspects of total calibration with a single standalone predictor. We observe on ﬁve real-world classiﬁcation tasks that the Evidential Turing Process is the only model that excels simultaneously at model ﬁt, class overlap quantiﬁcation, and out-of-domain detection. We benchmark ETP against the state of the art as addressed in the ablation plan in Table 1 according to the total calibration criteria developed in Sec. 2 on ﬁve real-world data sets.
Researcher Affiliation	Academia	Melih Kandemir Dept of Math and Computer Science University of Southern Denmark Odense, Denmark kandemir@imada.sdu.dk Abdullah Akgül Department of Computer Engineering Istanbul Technical University Istanbul, Turkey akgula15@itu.edu.tr Manuel Haussmann Department of Computer Science Aalto University Espoo, Finland manuel.haussmann@aalto.fi Gozde Unal Department of Computer Engineering Istanbul Technical University Istanbul, Turkey gozde.unal@itu.edu.tr
Pseudocode	Yes	Figure 2: The Evidential Turing Process: (right) The ETP training routine. while Model not converged do for Dbatch D do Choose DC Dbatch M EZ p M(Z) [r(Z, DC)] λ λ λFET P (λ) end end
Open Source Code	Yes	We provide a reference implementation of the proposed model and the experimental pipeline.2 (Footnote 2: https://github.com/ituvisionlab/Evidential Turing Process)
Open Datasets	Yes	We observe on ﬁve real-world classiﬁcation tasks that the Evidential Turing Process is the only model that excels simultaneously at model ﬁt, class overlap quantiﬁcation, and out-of-domain detection. FMNIST. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the MNIST (Le Cun et al., 2010) data set. CIFAR10. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the SVHN (Netzer et al., 2011). SVHN. We use a Le Net5-sized architecture (see Table 3). The out-of-distribution data is the CIFAR10 data set. IMDB Sentiment Classiﬁcation. We use a LSTM architecture with 64 embedding dimensions and 2 layers with 256 hidden dimensions.
Dataset Splits	No	The paper mentions 'test split' for evaluation and 'context set' for their process, but does not explicitly provide percentages or counts for distinct training, validation, and test dataset splits needed for reproducibility. It also mentions 'data augmentation' which is usually done for training data.
Hardware Specification	Yes	All experiments are implemented in Py Torch (Paszke et al., 2019) version 1.7.1 and trained on a TITAN RTX.
Software Dependencies	Yes	All experiments are implemented in Py Torch (Paszke et al., 2019) version 1.7.1 and trained on a TITAN RTX. It is optimized for 400 epochs with the Adam optimizer (Kingma & Ba, 2015) using the Py Torch (Paszke et al., 2019) default parameters and a learning rate of 0.001.
Experiment Setup	Yes	We train each model for 50 epochs with the Adam optimizer (Kingma & Ba, 2015) using a learning rate of 0.001. (for FMNIST) We train each model for 100 epochs using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0001. (for CIFAR10/SVHN) We train each model for 20 epochs using the SGD optimizer (Robbins & Monro, 1951) with a learning rate of 0.05 and 0.9 momentum. (for IMDB)