reproducibilityindex.ai

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Authors: Wenzhuo Liu, Fei Zhu, Shijie Ma, Cheng-lin Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in image classification, segmentation, and detection tasks demonstrate the effectiveness of MSPE, yielding superior performance on low-resolution inputs and performing comparably on high-resolution inputs with existing methods.
Researcher Affiliation	Academia	Wenzhuo Liu1,2, Fei Zhu3, Shijie Ma1,2, Cheng-Lin Liu1,2 1School of Artificial Intelligence, UCAS 2State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA 3Centre for Artificial Intelligence and Robotics, HKISI-CAS
Pseudocode	Yes	Algorithm 1 in Appendix E.2 details the training procedure of MSPE and Py Torch-style implementation.
Open Source Code	Yes	Code is available at https://github.com/Small Pig Peppa/ MSPE.
Open Datasets	Yes	We conduct experiments on 4 benchmark datasets: Image Net-1K [26] for classification tasks, ADE20K [27] and Cityscapes [28] for semantic segmentation, and COCO2017 [29] for object detection.
Dataset Splits	No	The paper uses benchmark datasets like Image Net-1K, ADE20K, Cityscapes, and COCO2017, which have predefined splits. However, it does not explicitly state the split percentages or sample counts for training, validation, and test sets within the paper.
Hardware Specification	Yes	This paper conducts experiments on a machine equipped with two AMD EPYC 7543 32-core processors; each slotted with 32 cores supporting two threads per core. The machine has 496 GB of memory and 8* NVIDIA Ge Force RTX 4090 graphics cards.
Software Dependencies	No	The paper mentions using open-sourced libraries such as timm, MMDetection, MMSegmentation, and PyTorch, but it does not specify the version numbers for any of these software dependencies.
Experiment Setup	Yes	MSPE is trained using SGD optimizer for five epochs, with a learning rate of 0.001, momentum of 0.9, weight decay of 0.0005, and batch size of 64 per GPU.