Corrupted Image Modeling for Self-Supervised Visual Pre-Training

Authors: Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach achieves compelling results in vision benchmarks, such as Image Net classification and ADE20K semantic segmentation.
Researcher Affiliation Collaboration Yuxin Fang 1, 2 Li Dong 2 Hangbo Bao 2 Xinggang Wang 1 Furu Wei 2 1 School of EIC, Huazhong University of Science & Technology 2 Microsoft Research
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper. It mentions using 'publicly available3 pre-trained DALL-E d VAE weight' and implements 'the pre-training using the codebase of BEi T', but does not provide a link or statement for their own implementation.
Open Datasets Yes Image Net-1K (Deng et al., 2009) training data is used to pre-train the small BEi T and the enhancer.
Dataset Splits Yes We study of the Vi T-B model s 100-epoch fine-tuning performance on Image Net-1k val set with different pre-training schedules in Table 11.
Hardware Specification Yes We conduct experiments on 16 or 32 V100 GPUs with 32GB memory.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'Mixed precision and deepspeed acceleration' but does not provide specific version numbers for these or other key software dependencies like PyTorch or Python.
Experiment Setup Yes A.4 PRE-TRAINING & FINE-TUNING CONFIGURATIONS. This section provides detailed settings such as 'Optimizer Adam W', 'Pre-training Epochs 300', 'Peak Learning Rate 1.5e-3', 'Batch Size 2048', and 'Weight Decay 0.05'.