reproducibilityindex.ai

Blending Anti-Aliasing into Vision Transformer

Authors: Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive investigations are conducted primarily, including the ﬁltering choices such as the basic Gaussian blurring ﬁlters in [10] and learnable convolutional ﬁlters, as well as different placements of these ﬁlters. Speciﬁcally, we propose a plug-and-play Aliasing-Reduction Module (ARM) to alleviate the aforementioned issue. We investigate the effectiveness and generalization of the proposed method across multiple tasks and various vision transformer families. This lightweight design consistently attains a clear boost over several famous structures.
Researcher Affiliation	Collaboration	Shengju Qian1 Hao Shao2 Yi Zhu3 Mu Li3 Jiaya Jia1 1The Chinese University of Hong Kong 2Tsinghua University 3Amazon Inc.
Pseudocode	No	The paper does not contain any structured pseudocode blocks or clearly labeled algorithm sections. It describes methods in prose and provides mathematical formulations.
Open Source Code	Yes	1Code is made available at https://github.com/amazon-research/anti-aliasing-transformer.
Open Datasets	Yes	All models are trained for 300 epochs on Image Net [51] training set using 8 Tesla V100 GPUs. Reference [51] is "Imagenet: A large-scale hierarchical image database. In CVPR, 2009.", indicating a publicly available dataset.
Dataset Splits	Yes	We report the performance of different anti-aliased variants on Image Net [51] validation set with 50K images. In Section 4.6, for a specific analysis: "The original 50K images are split into training split with 45K images and testing split with 5K images.".
Hardware Specification	Yes	All models are trained for 300 epochs on Image Net [51] training set using 8 Tesla V100 GPUs. Throughput is measured on a Tesla V100 GPU using [57].
Software Dependencies	No	The paper mentions software like "Py Torch [58]" and refers to "open-source implementation [50]" for Swin-T, but it does not specify exact version numbers for any software libraries or dependencies within the text.
Experiment Setup	Yes	All training and testing parameters remain consistent with its open-source implementation [50]. Except for T2T Vi T [7], which is originally trained for 310 epochs, all models are trained for 300 epochs on Image Net [51] training set using 8 Tesla V100 GPUs. When applied with our component, the training conﬁgurations are maintained the same as the baselines, such as the optimizers and data augmentations.