Calibrating a Deep Neural Network with Its Predecessors
Authors: Linwei Tao, Minjing Dong, Daochang Liu, Changming Sun, Chang Xu
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on various datasets, including CIFAR-10/100 [Krizhevsky, 2012] and Tiny Image Net [Deng et al., 2009] to evaluate the calibration performance. |
| Researcher Affiliation | Academia | 1School of Computer Science, Faculty of Engineering, University of Sydney, Australia 2CSIRO’s Data61, Australia |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Supplementary material and code are available at https://github.com/Linwei94/PCS |
| Open Datasets | Yes | We conduct experiments on various datasets, including CIFAR-10/100 [Krizhevsky, 2012] and Tiny Image Net [Deng et al., 2009] to evaluate the calibration performance. |
| Dataset Splits | Yes | We follow the same training and validation set spilt setting as [Mukhoti et al., 2020]. The learning rate is set to 0.1 for epoch 0 to 150, 0.01 for 150 to 250, and 0.001 for 250 until the end of training. For training on Tiny-Image Net, we set Ttrain = 100. |
| Hardware Specification | Yes | All experiments are conducted on a single Tesla V-100 GPU with all random seeds set to 1. |
| Software Dependencies | No | The paper mentions 'Our code and results of comparison method are based on the public code and the pre-trained weight provided by [Mukhoti et al., 2020]' but does not provide specific version numbers for software dependencies like Python or PyTorch. |
| Experiment Setup | Yes | For training on CIFAR-10/100, we set Ttrain = 350. The learning rate is set to 0.1 for epoch 0 to 150, 0.01 for 150 to 250, and 0.001 for 250 until the end of training... The fine-tuning learning rate is set to 10^-4 for CIFAR-10, 5x10^-4 for CIFAR-100, and 10^-3 for Tiny-Image Net. The searching process is performed with Tse = 100 steps. The population size is S = 100... All networks are optimized using the SGD optimizer with a weight decay at 5x10^-4 and a momentum of 0.9. The training batch size is set to 128. |