Corrupted Image Modeling for Self-Supervised Visual Pre-Training
Authors: Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach achieves compelling results in vision benchmarks, such as Image Net classification and ADE20K semantic segmentation. |
| Researcher Affiliation | Collaboration | Yuxin Fang 1, 2 Li Dong 2 Hangbo Bao 2 Xinggang Wang 1 Furu Wei 2 1 School of EIC, Huazhong University of Science & Technology 2 Microsoft Research |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using 'publicly available3 pre-trained DALL-E d VAE weight' and implements 'the pre-training using the codebase of BEi T', but does not provide a link or statement for their own implementation. |
| Open Datasets | Yes | Image Net-1K (Deng et al., 2009) training data is used to pre-train the small BEi T and the enhancer. |
| Dataset Splits | Yes | We study of the Vi T-B model s 100-epoch fine-tuning performance on Image Net-1k val set with different pre-training schedules in Table 11. |
| Hardware Specification | Yes | We conduct experiments on 16 or 32 V100 GPUs with 32GB memory. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer' and 'Mixed precision and deepspeed acceleration' but does not provide specific version numbers for these or other key software dependencies like PyTorch or Python. |
| Experiment Setup | Yes | A.4 PRE-TRAINING & FINE-TUNING CONFIGURATIONS. This section provides detailed settings such as 'Optimizer Adam W', 'Pre-training Epochs 300', 'Peak Learning Rate 1.5e-3', 'Batch Size 2048', and 'Weight Decay 0.05'. |