Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pre-training with Random Orthogonal Projection Image Modeling
Authors: Maryam Haghighat, Peyman Moghadam, Shaheer Mohamed, Piotr Koniusz
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show that using random orthogonal projection leads to superior performance compared to crop-based masking. We demonstrate state-of-the-art results on several popular benchmarks. Figure 1: Training efficiency of ROPIM vs. other methods. ROPIM achieves a higher accuracy (see also LGP-ROPIM) with a lower training time. We perform self-supervised pre-training on Image Net-1k (Russakovsky et al., 2015). |
| Researcher Affiliation | Collaboration | Maryam Haghighat , , , Peyman Moghadam , , Shaheer Mohamed , , Piotr Koniusz*, , Data61 CSIRO Queensland University of Technology Australian National University EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Random Orthogonal Projection Image Modeling (ROPIM). |
| Open Source Code | Yes | The code is available at https://github.com/csiro-robotics/ROPIM. |
| Open Datasets | Yes | We perform self-supervised pre-training on Image Net-1k (Russakovsky et al., 2015). Image Net-1K (Russakovsky et al., 2015) used by us is ILSVRC-2012 with 1k classes and 1.3M images. |
| Dataset Splits | Yes | Image Net100 train and validation sets contain 1300 and 50 images per class, respectively. ADE20K (Zhou et al., 2019) is a semantic segmentation dataset including 150 semantic categories, 20K training images, 2K validation images, and 3K images for testing. |
| Hardware Specification | Yes | For fair comparisons, the reported times in Figure 1 are derived from the use of the same resources (8 P100 GPUs) and maximum possible batch size per GPU for each method. |
| Software Dependencies | No | The paper mentions software components like "Adam W optimizer", "cosine learning rate scheduler", "Mixup", "Cutmix", "Label smoothing", "Drop path", and "Rand Augment", but does not specify their version numbers (e.g., PyTorch version, Python version, or version of specific libraries). |
| Experiment Setup | Yes | We train our models using Adam W optimizer, a weight decay of 0.05, β1 = 0.9, β2 = 0.95, and a cosine learning rate scheduler. Vi T-B and Vi T-S are pre-trained with an initial 10 epochs linear warm-up procedure and a batch size of 1520. For ROPIM, sketching ratio ρ= 1/7 is used unless otherwise mentioned. Table 10: Fine-tuning hyper-parameters for BEi T, MAE, Sim MIM and ROPIM. Config Value: Optimizer Adam W, Weight decay 0.05, Optimizer momentum β1 = 0.9, β2 = 0.999, Learning rate schedule cosine decay, Warmup epochs 5, Label smoothing 0.1, Mixup 0.8, Cutmix 1.0, Drop path 0.1, Rand Augment 9/0.5. |