Scattering Vision Transformer: Spectral Mixing Matters
Authors: Badri Patro, Vijay Agneeswaran
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that SVT achieves state-of-the-art performance on the Image Net dataset with a significant reduction in a number of parameters and FLOPS. SVT shows 2% improvement over Li Tv2 and i Former. SVT-H-S reaches 84.2% top-1 accuracy, while SVT-H-B reaches 85.2% (state-of-art for base versions) and SVT-H-L reaches 85.7% (again state-of-art for large versions). SVT also shows comparable results in other vision tasks such as instance segmentation. SVT also outperforms other transformers in transfer learning on standard datasets such as CIFAR10, CIFAR100, Oxford Flower, and Stanford Car datasets. |
| Researcher Affiliation | Industry | Badri Narayana Patro Microsoft badripatro@microsoft.com Vijay Srinivas Agneeswaran Microsoft vagneeswaran@microsoft.com |
| Pseudocode | No | The paper describes the architecture and methods but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The project page is available on this webpage (https://badripatro.github.io/svt/). |
| Open Datasets | Yes | We trained and evaluated Image Net1K [10] from scratch for image recognition task. b) We performed transfer learning on CIFAR10 [30], CIFAR-100 [30], Stanford Car [29], and Oxford Flower-102 [41] for Image recognition task. d) We fine-tune SVT for downstream instance segmentation tasks. ... pre-trained SVT model for the downstream instance segmentation task and obtain good results on the MS-COCO dataset as shown in Table5. |
| Dataset Splits | No | The paper mentions using standard datasets like ImageNet1K and COCO val2017, but does not explicitly specify the training/validation/test split percentages or methodology within the text. It implies standard splits by using common dataset names but does not detail them for reproducibility. |
| Hardware Specification | Yes | We report latency per sample on A100 GPU. |
| Software Dependencies | No | The paper mentions using a "standard torch package [48]" but does not provide specific version numbers for PyTorch or any other software dependencies, making it difficult to precisely reproduce the software environment. |
| Experiment Setup | No | The paper does not explicitly detail specific experimental setup parameters such as learning rates, batch sizes, optimizers, or training schedules in the provided text. |