Pre-training with Random Orthogonal Projection Image Modeling
Authors: Maryam Haghighat, Peyman Moghadam, Shaheer Mohamed, Piotr Koniusz
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show that using random orthogonal projection leads to superior performance compared to crop-based masking. We demonstrate state-of-the-art results on several popular benchmarks. Figure 1: Training efficiency of ROPIM vs. other methods. ROPIM achieves a higher accuracy (see also LGP-ROPIM) with a lower training time. We perform self-supervised pre-training on Image Net-1k (Russakovsky et al., 2015). |
| Researcher Affiliation | Collaboration | Maryam Haghighat , , , Peyman Moghadam , , Shaheer Mohamed , , Piotr Koniusz*, , Data61 CSIRO Queensland University of Technology Australian National University name.lastname@qut.edu.au, name.lastname@data61.csiro.au |
| Pseudocode | Yes | Algorithm 1 Random Orthogonal Projection Image Modeling (ROPIM). |
| Open Source Code | Yes | The code is available at https://github.com/csiro-robotics/ROPIM. |
| Open Datasets | Yes | We perform self-supervised pre-training on Image Net-1k (Russakovsky et al., 2015). Image Net-1K (Russakovsky et al., 2015) used by us is ILSVRC-2012 with 1k classes and 1.3M images. |
| Dataset Splits | Yes | Image Net100 train and validation sets contain 1300 and 50 images per class, respectively. ADE20K (Zhou et al., 2019) is a semantic segmentation dataset including 150 semantic categories, 20K training images, 2K validation images, and 3K images for testing. |
| Hardware Specification | Yes | For fair comparisons, the reported times in Figure 1 are derived from the use of the same resources (8 P100 GPUs) and maximum possible batch size per GPU for each method. |
| Software Dependencies | No | The paper mentions software components like "Adam W optimizer", "cosine learning rate scheduler", "Mixup", "Cutmix", "Label smoothing", "Drop path", and "Rand Augment", but does not specify their version numbers (e.g., PyTorch version, Python version, or version of specific libraries). |
| Experiment Setup | Yes | We train our models using Adam W optimizer, a weight decay of 0.05, β1 = 0.9, β2 = 0.95, and a cosine learning rate scheduler. Vi T-B and Vi T-S are pre-trained with an initial 10 epochs linear warm-up procedure and a batch size of 1520. For ROPIM, sketching ratio ρ= 1/7 is used unless otherwise mentioned. Table 10: Fine-tuning hyper-parameters for BEi T, MAE, Sim MIM and ROPIM. Config Value: Optimizer Adam W, Weight decay 0.05, Optimizer momentum β1 = 0.9, β2 = 0.999, Learning rate schedule cosine decay, Warmup epochs 5, Label smoothing 0.1, Mixup 0.8, Cutmix 1.0, Drop path 0.1, Rand Augment 9/0.5. |