reproducibilityindex.ai

TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

Authors: Hyeong Kyu Choi, Joonmyung Choi, Hyunwoo J. Kim

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our methods significantly improve the baseline models performance on CIFAR and Image Net-1K, while being more efficient than previous methods.
Researcher Affiliation	Academia	Department of Computer Science and Engineering, Korea University {imhgchoi, pizard, hyunwoojkim}@korea.ac.kr
Pseudocode	Yes	Algorithm 1 Token Mixup
Open Source Code	Yes	Code is available at https://github.com/mlvlab/Token Mixup.
Open Datasets	Yes	In experiments on CIFAR [33], we used Compact Convolution Transformer (CCT) [22] as our baseline. For Image Net-1k [34] experiments, we used the vanilla Vi T-B/16 [15] as baseline.
Dataset Splits	Yes	Top-1 validation accuracy is reported for each model... The officially reported Top-1 accuracy is 81.2%, which is achieved by using weights pre-trained on Image Net-21K and fine-tuning on Image Net-1K. Other experiment settings follow [22], and all experiments on CIFAR datasets were conducted on a single RTX A6000 GPU.
Hardware Specification	Yes	all experiments on CIFAR datasets were conducted on a single RTX A6000 GPU... Experiments for Horizontal Token Mixup were conducted on a single NVIDIA A100 GPU, and 4 RTX 3090 GPUs were used in parallel for Vertical Token Mixup.
Software Dependencies	No	I cannot find specific software dependencies with version numbers mentioned in the paper. The paper refers to models like CCT and ViT but does not list programming languages, libraries, or frameworks with their respective versions.
Experiment Setup	Yes	For CIFAR experiments, we adopt the 1500-epoch version of CCT. We modified the learning rate scheduler and positional embedding type to achieve better performance than original papers... In Table 3, we report the average latency of saliency detection per iteration in our CIFAR-100 experiment setting with a batch size of 128.