FFT-Based Dynamic Token Mixer for Vision
Authors: Yuki Tatsunami, Masato Taki
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results of image classification and downstream tasks, analysis, and visualization show that our models are helpful. |
| Researcher Affiliation | Collaboration | Yuki Tatsunami1,2, Masato Taki1 1Rikkyo University 2Any Tech Co., Ltd. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/okojoalg/dfformer |
| Open Datasets | Yes | We conduct experiments on Image Net-1K benchmark (Krizhevsky, Sutskever, and Hinton 2012) and perform further experiments to confirm downstream tasks such as ADE20K (Zhou et al. 2017). |
| Dataset Splits | Yes | Image Net-1K ... contains 1,281,167 training images and 50,000 validation images. |
| Hardware Specification | Yes | Throughput has been benchmarked on a V100 with 16GB memory at a batch size of 16. |
| Software Dependencies | No | The implementation is based on Py Torch (Paszke et al. 2019) and timm (Wightman 2019). No specific version numbers for these libraries are provided. |
| Experiment Setup | Yes | We employ Adam W (Loshchilov and Hutter 2019) optimizer for 300 epochs with a batch size of 1024. The base learning rate of batch size 512 5 10 4, 20 epochs of linear warm-up, cosine decay for learning rate, and weight decay of 0.05 are used. |