reproducibilityindex.ai

SWAT: Spatial Structure Within and Among Tokens

Authors: Kumara Kahatapitiya, Michael S. Ryoo

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our family of models, SWAT on image classification and semantic segmentation. We use Imagenet-1K [Deng et al., 2009] and ADE20K [Zhou et al., 2019] as benchmarks to compare against common Transformer/Mixer/Conv architectures such as Dei T [Touvron et al., 2021b], Swin [Liu et al., 2021], MLP-Mixer [Tolstikhin et al., 2021], Res MLP [Touvron et al., 2021a] and VAN [Guo et al., 2022].
Researcher Affiliation	Academia	Kumara Kahatapitiya and Michael S. Ryoo Stony Brook University {kkahatapitiy, mryoo}@cs.stonybrook.edu
Pseudocode	No	The paper describes its methods using diagrams and text but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at github.com/kkahatapitiya/SWAT.
Open Datasets	Yes	We use Imagenet-1K [Deng et al., 2009] and ADE20K [Zhou et al., 2019] as benchmarks to compare against common Transformer/Mixer/Conv architectures...
Dataset Splits	Yes	Image Net-1K [Deng et al., 2009] is a commonly-used classification benchmark, with 1.2M training images and 50K validation images, annotated with 1000 categories. and ADE20K [Zhou et al., 2019] benchmark contains annotations for semantic segmentation across 150 categories. It comes with 25K annotated images in total, with 20K training, 2K validation and 3K testing.
Hardware Specification	Yes	FPS is measured on a single V100 GPU.
Software Dependencies	No	The paper mentions using the 'timm' library, 'mmsegmentation' framework, and 'PyTorch-like' implementations, but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	For all our models, we report Top-1 (%) accuracy on single-crop evaluation with complexity metrics such as Parameters and FLOPs. We train all our models for 300 epochs on inputs of 224x224 using the timm [Wightman, 2019] library. We use the original hyperparameters for all backbones, without further tuning. All models are trained with Mixed Precision.