SWAT: Spatial Structure Within and Among Tokens

Authors: Kumara Kahatapitiya, Michael S. Ryoo

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our family of models, SWAT on image classification and semantic segmentation. We use Imagenet-1K [Deng et al., 2009] and ADE20K [Zhou et al., 2019] as benchmarks to compare against common Transformer/Mixer/Conv architectures such as Dei T [Touvron et al., 2021b], Swin [Liu et al., 2021], MLP-Mixer [Tolstikhin et al., 2021], Res MLP [Touvron et al., 2021a] and VAN [Guo et al., 2022].
Researcher Affiliation Academia Kumara Kahatapitiya and Michael S. Ryoo Stony Brook University {kkahatapitiy, mryoo}@cs.stonybrook.edu
Pseudocode No The paper describes its methods using diagrams and text but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at github.com/kkahatapitiya/SWAT.
Open Datasets Yes We use Imagenet-1K [Deng et al., 2009] and ADE20K [Zhou et al., 2019] as benchmarks to compare against common Transformer/Mixer/Conv architectures...
Dataset Splits Yes Image Net-1K [Deng et al., 2009] is a commonly-used classification benchmark, with 1.2M training images and 50K validation images, annotated with 1000 categories. and ADE20K [Zhou et al., 2019] benchmark contains annotations for semantic segmentation across 150 categories. It comes with 25K annotated images in total, with 20K training, 2K validation and 3K testing.
Hardware Specification Yes FPS is measured on a single V100 GPU.
Software Dependencies No The paper mentions using the 'timm' library, 'mmsegmentation' framework, and 'PyTorch-like' implementations, but does not provide specific version numbers for any of these software components.
Experiment Setup Yes For all our models, we report Top-1 (%) accuracy on single-crop evaluation with complexity metrics such as Parameters and FLOPs. We train all our models for 300 epochs on inputs of 224x224 using the timm [Wightman, 2019] library. We use the original hyperparameters for all backbones, without further tuning. All models are trained with Mixed Precision.