Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
R-MAE: Regions Meet Masked Autoencoders
Authors: Duy Kien Nguyen, Yanghao Li, Vaibhav Aggarwal, Martin R. Oswald, Alexander Kirillov, Cees G. M. Snoek, Xinlei Chen
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS", "4.1 EXPERIMENTAL SETUPS", "4.2 EXPERIMENTAL RESULTS", "Ablation studies. |
| Researcher Affiliation | Collaboration | 1FAIR, Meta AI 2University of Amsterdam |
| Pseudocode | No | The paper describes its methods verbally and with diagrams, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1The code is provided at https://github.com/facebookresearch/r-mae |
| Open Datasets | Yes | Deviating from prior practices (Bao et al., 2022; He et al., 2022), we develop RAE and R-MAE by pre-training on COCO train2017 (Lin et al., 2014)... For fairness, we also pre-train Vi Ts with R-MAE on Image Net (Deng et al., 2009)... |
| Dataset Splits | Yes | Deviating from prior practices (Bao et al., 2022; He et al., 2022), we develop RAE and R-MAE by pre-training on COCO train2017 (Lin et al., 2014)... For fairness, we also pre-train Vi Ts with R-MAE on Image Net (Deng et al., 2009)... |
| Hardware Specification | No | The paper discusses computational overheads in terms of FLOPs but does not specify any GPU models, CPU types, or other hardware used for running experiments. |
| Software Dependencies | No | The paper refers to frameworks like PyTorch and vision transformers but does not list specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Our base learning rate is set to 1e-4... Vi T-B (Dosovitskiy et al., 2020) is set as the pixel backbone, and a 1-block, 128-dimensional Vi T is used for the neck, the region encoder and the region decoder... k=8 regions are randomly sampled per image... mask ratio of ÎēR=0.75... The input image size is 1024 1024 with large-scale jitter between a scale range of [0.1, 2.0]. We finetune for 100 epochs with batch size of 64. |