Scattering Vision Transformer: Spectral Mixing Matters

Authors: Badri Patro, Vijay Agneeswaran

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that SVT achieves state-of-the-art performance on the Image Net dataset with a significant reduction in a number of parameters and FLOPS. SVT shows 2% improvement over Li Tv2 and i Former. SVT-H-S reaches 84.2% top-1 accuracy, while SVT-H-B reaches 85.2% (state-of-art for base versions) and SVT-H-L reaches 85.7% (again state-of-art for large versions). SVT also shows comparable results in other vision tasks such as instance segmentation. SVT also outperforms other transformers in transfer learning on standard datasets such as CIFAR10, CIFAR100, Oxford Flower, and Stanford Car datasets.
Researcher Affiliation Industry Badri Narayana Patro Microsoft badripatro@microsoft.com Vijay Srinivas Agneeswaran Microsoft vagneeswaran@microsoft.com
Pseudocode No The paper describes the architecture and methods but does not provide a formal pseudocode or algorithm block.
Open Source Code Yes The project page is available on this webpage (https://badripatro.github.io/svt/).
Open Datasets Yes We trained and evaluated Image Net1K [10] from scratch for image recognition task. b) We performed transfer learning on CIFAR10 [30], CIFAR-100 [30], Stanford Car [29], and Oxford Flower-102 [41] for Image recognition task. d) We fine-tune SVT for downstream instance segmentation tasks. ... pre-trained SVT model for the downstream instance segmentation task and obtain good results on the MS-COCO dataset as shown in Table5.
Dataset Splits No The paper mentions using standard datasets like ImageNet1K and COCO val2017, but does not explicitly specify the training/validation/test split percentages or methodology within the text. It implies standard splits by using common dataset names but does not detail them for reproducibility.
Hardware Specification Yes We report latency per sample on A100 GPU.
Software Dependencies No The paper mentions using a "standard torch package [48]" but does not provide specific version numbers for PyTorch or any other software dependencies, making it difficult to precisely reproduce the software environment.
Experiment Setup No The paper does not explicitly detail specific experimental setup parameters such as learning rates, batch sizes, optimizers, or training schedules in the provided text.