Blending Anti-Aliasing into Vision Transformer
Authors: Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive investigations are conducted primarily, including the filtering choices such as the basic Gaussian blurring filters in [10] and learnable convolutional filters, as well as different placements of these filters. Specifically, we propose a plug-and-play Aliasing-Reduction Module (ARM) to alleviate the aforementioned issue. We investigate the effectiveness and generalization of the proposed method across multiple tasks and various vision transformer families. This lightweight design consistently attains a clear boost over several famous structures. |
| Researcher Affiliation | Collaboration | Shengju Qian1 Hao Shao2 Yi Zhu3 Mu Li3 Jiaya Jia1 1The Chinese University of Hong Kong 2Tsinghua University 3Amazon Inc. |
| Pseudocode | No | The paper does not contain any structured pseudocode blocks or clearly labeled algorithm sections. It describes methods in prose and provides mathematical formulations. |
| Open Source Code | Yes | 1Code is made available at https://github.com/amazon-research/anti-aliasing-transformer. |
| Open Datasets | Yes | All models are trained for 300 epochs on Image Net [51] training set using 8 Tesla V100 GPUs. Reference [51] is "Imagenet: A large-scale hierarchical image database. In CVPR, 2009.", indicating a publicly available dataset. |
| Dataset Splits | Yes | We report the performance of different anti-aliased variants on Image Net [51] validation set with 50K images. In Section 4.6, for a specific analysis: "The original 50K images are split into training split with 45K images and testing split with 5K images.". |
| Hardware Specification | Yes | All models are trained for 300 epochs on Image Net [51] training set using 8 Tesla V100 GPUs. Throughput is measured on a Tesla V100 GPU using [57]. |
| Software Dependencies | No | The paper mentions software like "Py Torch [58]" and refers to "open-source implementation [50]" for Swin-T, but it does not specify exact version numbers for any software libraries or dependencies within the text. |
| Experiment Setup | Yes | All training and testing parameters remain consistent with its open-source implementation [50]. Except for T2T Vi T [7], which is originally trained for 310 epochs, all models are trained for 300 epochs on Image Net [51] training set using 8 Tesla V100 GPUs. When applied with our component, the training configurations are maintained the same as the baselines, such as the optimizers and data augmentations. |