FFT-Based Dynamic Token Mixer for Vision

Authors: Yuki Tatsunami, Masato Taki

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The results of image classification and downstream tasks, analysis, and visualization show that our models are helpful.
Researcher Affiliation Collaboration Yuki Tatsunami1,2, Masato Taki1 1Rikkyo University 2Any Tech Co., Ltd.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/okojoalg/dfformer
Open Datasets Yes We conduct experiments on Image Net-1K benchmark (Krizhevsky, Sutskever, and Hinton 2012) and perform further experiments to confirm downstream tasks such as ADE20K (Zhou et al. 2017).
Dataset Splits Yes Image Net-1K ... contains 1,281,167 training images and 50,000 validation images.
Hardware Specification Yes Throughput has been benchmarked on a V100 with 16GB memory at a batch size of 16.
Software Dependencies No The implementation is based on Py Torch (Paszke et al. 2019) and timm (Wightman 2019). No specific version numbers for these libraries are provided.
Experiment Setup Yes We employ Adam W (Loshchilov and Hutter 2019) optimizer for 300 epochs with a batch size of 1024. The base learning rate of batch size 512 5 10 4, 20 epochs of linear warm-up, cosine decay for learning rate, and weight decay of 0.05 are used.