Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Memory Efficient Transformer Adapter for Dense Predictions

Authors: Dong Zhang, Rui Yan, Pingcheng Dong, Kwang-Ting Cheng

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, extensive evaluations on multiple representative datasets validate that META substantially enhances the predicted quality, while achieving a new state-of-the-art accuracy-efficiency trade-off. Theoretically, we demonstrate that META exhibits superior generalization capability and stronger adaptability.
Researcher Affiliation Academia Dong Zhang1,2, Rui Yan3, Pingcheng Dong1, Kwang-Ting Cheng1 1The Hong Kong University of Science and Technology 2AI Chip Center for Emerging Smart Systems (ACCESS), 3Nanjing University EMAIL;EMAIL;EMAIL
Pseudocode No The paper describes the architecture and computational processes using mathematical formulas and descriptive text, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes To facilitate a fair result comparison with existing methods, we conduct experiments, including the ablation analysis, on two commonly used datasets: MS-COCO (Caesar et al., 2018) for ODet and ISeg, and ADE20K (Zhou et al., 2017) for SSeg.
Dataset Splits Yes We report the experimental results on the val set of MS-COCO (Caesar et al., 2018), where the Image Net-1k pre-trained Vi T-B (Li et al., 2022b) is used as the backbone. For SSeg, we choose Uper Net (Xiao et al., 2018) with 160k iterations as the baseline, where the Image Net-1k pre-trained Vi T-B (Li et al., 2022b) is used as the backbone. We report the single-scale testing results on the val set of ADE20K (Zhou et al., 2017).
Hardware Specification Yes The reported inference results are measured by A100 GPUs with per-GPU batch size 2.
Software Dependencies No The paper mentions various models and baselines (e.g., Mask R-CNN, Cascade Mask R-CNN, Vi T-Adapter) but does not specify software versions for programming languages, libraries, or frameworks used for implementation (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Unless otherwise specified, these baselines are set up to be consistent with their papers and the settings of the Vi T-Adapter (Chen et al., 2022b) method. Even with different training schedules (i.e., 1 , and 3 with MS), our method can also improve the model performance, demonstrating the plug-and-play advantage of META. For SSeg, we choose Uper Net (Xiao et al., 2018) with 160k iterations as the baseline, where the Image Net-1k pre-trained Vi T-B (Li et al., 2022b) is used as the backbone.