Calibrating a Deep Neural Network with Its Predecessors

Authors: Linwei Tao, Minjing Dong, Daochang Liu, Changming Sun, Chang Xu

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on various datasets, including CIFAR-10/100 [Krizhevsky, 2012] and Tiny Image Net [Deng et al., 2009] to evaluate the calibration performance.
Researcher Affiliation Academia 1School of Computer Science, Faculty of Engineering, University of Sydney, Australia 2CSIRO’s Data61, Australia
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Supplementary material and code are available at https://github.com/Linwei94/PCS
Open Datasets Yes We conduct experiments on various datasets, including CIFAR-10/100 [Krizhevsky, 2012] and Tiny Image Net [Deng et al., 2009] to evaluate the calibration performance.
Dataset Splits Yes We follow the same training and validation set spilt setting as [Mukhoti et al., 2020]. The learning rate is set to 0.1 for epoch 0 to 150, 0.01 for 150 to 250, and 0.001 for 250 until the end of training. For training on Tiny-Image Net, we set Ttrain = 100.
Hardware Specification Yes All experiments are conducted on a single Tesla V-100 GPU with all random seeds set to 1.
Software Dependencies No The paper mentions 'Our code and results of comparison method are based on the public code and the pre-trained weight provided by [Mukhoti et al., 2020]' but does not provide specific version numbers for software dependencies like Python or PyTorch.
Experiment Setup Yes For training on CIFAR-10/100, we set Ttrain = 350. The learning rate is set to 0.1 for epoch 0 to 150, 0.01 for 150 to 250, and 0.001 for 250 until the end of training... The fine-tuning learning rate is set to 10^-4 for CIFAR-10, 5x10^-4 for CIFAR-100, and 10^-3 for Tiny-Image Net. The searching process is performed with Tse = 100 steps. The population size is S = 100... All networks are optimized using the SGD optimizer with a weight decay at 5x10^-4 and a momentum of 0.9. The training batch size is set to 128.