TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

Authors: Hyeong Kyu Choi, Joonmyung Choi, Hyunwoo J. Kim

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our methods significantly improve the baseline models performance on CIFAR and Image Net-1K, while being more efficient than previous methods.
Researcher Affiliation Academia Department of Computer Science and Engineering, Korea University {imhgchoi, pizard, hyunwoojkim}@korea.ac.kr
Pseudocode Yes Algorithm 1 Token Mixup
Open Source Code Yes Code is available at https://github.com/mlvlab/Token Mixup.
Open Datasets Yes In experiments on CIFAR [33], we used Compact Convolution Transformer (CCT) [22] as our baseline. For Image Net-1k [34] experiments, we used the vanilla Vi T-B/16 [15] as baseline.
Dataset Splits Yes Top-1 validation accuracy is reported for each model... The officially reported Top-1 accuracy is 81.2%, which is achieved by using weights pre-trained on Image Net-21K and fine-tuning on Image Net-1K. Other experiment settings follow [22], and all experiments on CIFAR datasets were conducted on a single RTX A6000 GPU.
Hardware Specification Yes all experiments on CIFAR datasets were conducted on a single RTX A6000 GPU... Experiments for Horizontal Token Mixup were conducted on a single NVIDIA A100 GPU, and 4 RTX 3090 GPUs were used in parallel for Vertical Token Mixup.
Software Dependencies No I cannot find specific software dependencies with version numbers mentioned in the paper. The paper refers to models like CCT and ViT but does not list programming languages, libraries, or frameworks with their respective versions.
Experiment Setup Yes For CIFAR experiments, we adopt the 1500-epoch version of CCT. We modified the learning rate scheduler and positional embedding type to achieve better performance than original papers... In Table 3, we report the average latency of saliency detection per iteration in our CIFAR-100 experiment setting with a batch size of 128.