Scaling of Class-wise Training Losses for Post-hoc Calibration
Authors: Seungjin Jung, Seungmo Seo, Yonghyun Jeong, Jongwon Choi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the proposed framework by employing it in the various post-hoc calibration methods, which generally improves calibration performance while preserving accuracy, and discover through the investigation that our approach performs well with unbalanced datasets and untuned hyperparameters. |
| Researcher Affiliation | Collaboration | 1Department of Artificial Intelligence, Chung-Ang University, Seoul, Korea 2Naver CLOVA, Seongnam, Korea 3Department of Advanced Imaging, Chung-Ang University, Seoul, Korea. |
| Pseudocode | No | The paper describes its methods using prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available online2. 2https://github.com/SeungjinJung/SCTL |
| Open Datasets | Yes | We use three different datasets and six different pre-trained models to train and evaluate the calibration methods. We separate the datasets into validation datasets and test datasets with the sizes of 25000/10000 for CIFAR10 and CIFAR100 datasets (Krizhevsky & Hinton, 2009) and 25000/25000 for Image Net dataset (Deng et al., 2009). |
| Dataset Splits | Yes | We separate the datasets into validation datasets and test datasets with the sizes of 25000/10000 for CIFAR10 and CIFAR100 datasets (Krizhevsky & Hinton, 2009) and 25000/25000 for Image Net dataset (Deng et al., 2009). |
| Hardware Specification | Yes | We conduct our experiments upon one RTX3090 environment. |
| Software Dependencies | No | The paper mentions optimizers like LBFGS and Adam but does not provide specific version numbers for any software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | We train baseline methods using a learning rate of 0.02, 1000 epochs, and a cross-entropy loss. TS, ETS, and CTS use LBFGS optimizer... and PTS utilizes Adam optimizer with 0.002 weight decay. We initialize α and β by 1.0 and 1.5, respectively, before their optimization. In the learning process, we use the hyperparameters referred to in Table. 4. |