Logit Mixing Training for More Reliable and Accurate Prediction

Authors: Duhyeon Bang, Kyungjune Baek, Jiwoo Kim, Yunho Jeon, Jin-Hwa Kim, Jiwon Kim, Jongwuk Lee, Hyunjung Shim

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experimental results on the imageand language-based tasks demonstrate that Logit Mix achieves state-of-the-art performance among recent data augmentation techniques regarding calibration error and prediction accuracy.
Researcher Affiliation Collaboration Duhyeon Bang1 , Kyungjune Baek2 , Jiwoo Kim3 , Yunho Jeon4 , Jin-Hwa Kim5 , Jiwon Kim1 , Jongwuk Lee3 and Hyunjung Shim6 1SK T-Brain 2School of Integrated Technology, Yonsei University 3Department of Software, Sungkyunkwan University 4MOFL 5NAVER AI Lab 6 Kim Jaechul Graduate School of AI, KAIST
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Methods are described in text and mathematical formulas.
Open Source Code No The paper does not provide concrete access to source code, such as a specific repository link or an explicit code release statement for the methodology described.
Open Datasets Yes The image datasets include CIFAR100 [Krizhevsky and Hinton, 2009] (32 32 RGB images in 100 classes), Tiny Image Net (64 64 RGB images in 100 classes) and ILSVRC2015 [Russakovsky et al., 2015] (256 256 RGB images in 1000 classes). Additionally, the General Language Understanding Evaluation (GLUE) benchmark [Wang et al., 2018].
Dataset Splits Yes The image datasets include CIFAR100 [Krizhevsky and Hinton, 2009] (32 32 RGB images in 100 classes), Tiny Image Net (64 64 RGB images in 100 classes) and ILSVRC2015 [Russakovsky et al., 2015] (256 256 RGB images in 1000 classes). Additionally, the General Language Understanding Evaluation (GLUE) benchmark [Wang et al., 2018].
Hardware Specification Yes To train the models on all the datasets except for ILSVRC2015, we use a single Titan XP GPU with 12 GB memory. For ILSVRC2015, we utilize four V100 GPU.
Software Dependencies No The paper mentions using "SGD optimization" and "BERT" but does not provide specific version numbers for any key software components or libraries (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup Yes When finetuning BERTBASE (or BERTLARGE), the batch size is 8, the learning rate is 2e 5, the max sequence length is 128, and the number of the training epochs is 3 for all eight tasks. We use a beta distribution with α = 3.0 for λ.