R-MAE: Regions Meet Masked Autoencoders
Authors: Duy Kien Nguyen, Yanghao Li, Vaibhav Aggarwal, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS", "4.1 EXPERIMENTAL SETUPS", "4.2 EXPERIMENTAL RESULTS", "Ablation studies. |
| Researcher Affiliation | Collaboration | 1FAIR, Meta AI 2University of Amsterdam |
| Pseudocode | No | The paper describes its methods verbally and with diagrams, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code is provided at https://github.com/facebookresearch/r-mae |
| Open Datasets | Yes | Deviating from prior practices (Bao et al., 2022; He et al., 2022), we develop RAE and R-MAE by pre-training on COCO train2017 (Lin et al., 2014)... For fairness, we also pre-train Vi Ts with R-MAE on Image Net (Deng et al., 2009)... |
| Dataset Splits | Yes | Deviating from prior practices (Bao et al., 2022; He et al., 2022), we develop RAE and R-MAE by pre-training on COCO train2017 (Lin et al., 2014)... For fairness, we also pre-train Vi Ts with R-MAE on Image Net (Deng et al., 2009)... |
| Hardware Specification | No | The paper discusses computational overheads in terms of FLOPs but does not specify any GPU models, CPU types, or other hardware used for running experiments. |
| Software Dependencies | No | The paper refers to frameworks like PyTorch and vision transformers but does not list specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Our base learning rate is set to 1e-4... Vi T-B (Dosovitskiy et al., 2020) is set as the pixel backbone, and a 1-block, 128-dimensional Vi T is used for the neck, the region encoder and the region decoder... k=8 regions are randomly sampled per image... mask ratio of βR=0.75... The input image size is 1024 1024 with large-scale jitter between a scale range of [0.1, 2.0]. We finetune for 100 epochs with batch size of 64. |