MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Authors: Wenzhuo Liu, Fei Zhu, Shijie Ma, Cheng-lin Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in image classification, segmentation, and detection tasks demonstrate the effectiveness of MSPE, yielding superior performance on low-resolution inputs and performing comparably on high-resolution inputs with existing methods. |
| Researcher Affiliation | Academia | Wenzhuo Liu1,2, Fei Zhu3, Shijie Ma1,2, Cheng-Lin Liu1,2 1School of Artificial Intelligence, UCAS 2State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA 3Centre for Artificial Intelligence and Robotics, HKISI-CAS |
| Pseudocode | Yes | Algorithm 1 in Appendix E.2 details the training procedure of MSPE and Py Torch-style implementation. |
| Open Source Code | Yes | Code is available at https://github.com/Small Pig Peppa/ MSPE. |
| Open Datasets | Yes | We conduct experiments on 4 benchmark datasets: Image Net-1K [26] for classification tasks, ADE20K [27] and Cityscapes [28] for semantic segmentation, and COCO2017 [29] for object detection. |
| Dataset Splits | No | The paper uses benchmark datasets like Image Net-1K, ADE20K, Cityscapes, and COCO2017, which have predefined splits. However, it does not explicitly state the split percentages or sample counts for training, validation, and test sets within the paper. |
| Hardware Specification | Yes | This paper conducts experiments on a machine equipped with two AMD EPYC 7543 32-core processors; each slotted with 32 cores supporting two threads per core. The machine has 496 GB of memory and 8* NVIDIA Ge Force RTX 4090 graphics cards. |
| Software Dependencies | No | The paper mentions using open-sourced libraries such as timm, MMDetection, MMSegmentation, and PyTorch, but it does not specify the version numbers for any of these software dependencies. |
| Experiment Setup | Yes | MSPE is trained using SGD optimizer for five epochs, with a learning rate of 0.001, momentum of 0.9, weight decay of 0.0005, and batch size of 64 per GPU. |