BEiT: BERT Pre-Training of Image Transformers

Authors: Hangbo Bao, Li Dong, Songhao Piao, Furu Wei

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on image classification and semantic segmentation show that our model achieves competitive results with previous pre-training methods.
Researcher Affiliation Collaboration Hangbo Bao , Li Dong , Songhao Piao , Furu Wei Harbin Institute of Technology Microsoft Research
Pseudocode Yes Algorithm 1 Blockwise Masking
Open Source Code Yes https://github.com/microsoft/unilm
Open Datasets Yes We pretrain BEIT on the training set of Image Net-1K (Russakovsky et al., 2015), which contains about 1.2M images.
Dataset Splits No The paper mentions using 'training set of Image Net-1K' and evaluating on 'ILSVRC-2012 Image Net dataset' but does not provide specific training/validation/test split percentages or sample counts.
Hardware Specification Yes The 500k training steps take about five days using 16 Nvidia Telsa V100 32GB GPU cards.
Software Dependencies No The paper mentions optimizers (Adam) and other models (SETR-PUP) by reference, but does not specify the versions of software libraries or frameworks used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The paper includes detailed hyperparameters in '2.5 PRE-TRAINING SETUP', 'G HYPERPARAMETERS FOR PRE-TRAINING (Table 12)', 'H HYPERPARAMETERS FOR IMAGE CLASSIFICATION FINE-TUNING (Table 13)', and 'I HYPERPARAMETERS FOR ADE20K SEMANTIC SEGMENTATION FINE-TUNING (Table 14)' sections, covering learning rates, batch sizes, optimizers, and other settings.