Being Bayesian about Categorical Probability

Authors: Taejong Joo, Uijung Chung, Min-Gwan Seo

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments show effectiveness of being Bayesian about the categorical probability in improving generalization performances, uncertainty estimation, and calibration property. In this section, we show versatility of BM through extensive empirical evaluations. We first verify its improvement of the generalization error in image classification tasks (section 5.1).
Researcher Affiliation Industry Taejong Joo 1 Uijung Chung 1 Min-Gwan Seo 1 1ESTsoft, Republic of Korea.
Pseudocode No The paper describes its methodology through mathematical equations and textual descriptions but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes To support reproducibility, we release our code at: https://github.com/ tjoo512/belief-matching-framework.
Open Datasets Yes We evaluate the generalization performance of BM on CIFAR (Krizhevsky, 2009) with the pre-activation Res Net (He et al., 2016b). We next perform a large-scale experiment using Res Next50 32x4d and Res Next-101 32x8d (Xie et al., 2017) on Image Net (Russakovsky et al., 2015).
Dataset Splits Yes Table 1. Test classification error rates on CIFAR. Here, we split a train set of 50K examples into a train set of 40K examples and a validation set of 10K example. Image Net contains approximately 1.3M training samples and 50K validation samples
Hardware Specification Yes We performed all experiments on a single workstation with 8 GPUs (NVIDIA Ge Force RTX 2080 Ti).
Software Dependencies No The paper mentions using specific models (e.g., Res Net) and optimizers (e.g., Adam) but does not provide specific version numbers for any software libraries or dependencies used in the experiments (e.g., PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes However, we additionally use an initial learning rate warm-up and gradient clipping, which are extremely helpful for stable training of BM. Specifically, we use learning rates of [0.1ϵ, 0.2ϵ, 0.4ϵ, 0.6ϵ, 0.8ϵ] for first five epochs when the reference learning rate is ϵ and clip gradient when its norm exceeds 1.0.