Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Masked Image Residual Learning for Scaling Deeper Vision Transformers
Authors: Guoxi Huang, Hongtao Fu, Adrian G. Bors
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed MIRL method is evaluated on image classification, object detection and semantic segmentation tasks. All models are pre-trained on Image Net-1K and then fine-tuned in downstream tasks. ... Table 2: MIRL ablation experiments on Image Net-1K |
| Researcher Affiliation | Collaboration | Guoxi Huang Baidu Inc. EMAIL Hongtao Fu Huazhong University of Science and Technology EMAIL Adrian G. Bors University of York EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and pretrained models are available at: https://github.com/russellllaputa/MIRL. |
| Open Datasets | Yes | We pre-train all models on the training set of Image Net-1K with 32 GPUs. ... The experiment is conducted on MS COCO [30]... We compare our method with previous results on the ADE20K [61] dataset |
| Dataset Splits | Yes | All models are pre-trained on Image Net-1K and then fine-tuned in downstream tasks. ... Table 2: MIRL ablation experiments on Image Net-1K: We report the fine-tuning (ft) accuracy(%) for all models, which are pre-trained for 300 epochs. |
| Hardware Specification | No | We pre-train all models on the training set of Image Net-1K with 32 GPUs. |
| Software Dependencies | No | The paper mentions frameworks and libraries such as Transformer architecture, MAE, Mask R-CNN, and mmdetection, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Pre-training setup. We pre-train all models on the training set of Image Net-1K with 32 GPUs. By default, Vi T-B-24 is divided into 4 segments, while Vi T-S-54 and Vi T-B-48 are split into 6 segments, and others into 2. Each appended decoder has 2 Transformer blocks with an injected DID module. We follow the setup in [21], masking 75% of visual tokens and applying basic data augmentation, including random horizontal flipping and random resized cropping. Full implementation details are in Appendix A. |