MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Authors: Wenzhuo Liu, Fei Zhu, Shijie Ma, Cheng-lin Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in image classification, segmentation, and detection tasks demonstrate the effectiveness of MSPE, yielding superior performance on low-resolution inputs and performing comparably on high-resolution inputs with existing methods.
Researcher Affiliation Academia Wenzhuo Liu1,2, Fei Zhu3, Shijie Ma1,2, Cheng-Lin Liu1,2 1School of Artificial Intelligence, UCAS 2State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA 3Centre for Artificial Intelligence and Robotics, HKISI-CAS
Pseudocode Yes Algorithm 1 in Appendix E.2 details the training procedure of MSPE and Py Torch-style implementation.
Open Source Code Yes Code is available at https://github.com/Small Pig Peppa/ MSPE.
Open Datasets Yes We conduct experiments on 4 benchmark datasets: Image Net-1K [26] for classification tasks, ADE20K [27] and Cityscapes [28] for semantic segmentation, and COCO2017 [29] for object detection.
Dataset Splits No The paper uses benchmark datasets like Image Net-1K, ADE20K, Cityscapes, and COCO2017, which have predefined splits. However, it does not explicitly state the split percentages or sample counts for training, validation, and test sets within the paper.
Hardware Specification Yes This paper conducts experiments on a machine equipped with two AMD EPYC 7543 32-core processors; each slotted with 32 cores supporting two threads per core. The machine has 496 GB of memory and 8* NVIDIA Ge Force RTX 4090 graphics cards.
Software Dependencies No The paper mentions using open-sourced libraries such as timm, MMDetection, MMSegmentation, and PyTorch, but it does not specify the version numbers for any of these software dependencies.
Experiment Setup Yes MSPE is trained using SGD optimizer for five epochs, with a learning rate of 0.001, momentum of 0.9, weight decay of 0.0005, and batch size of 64 per GPU.