Stochastic positional embeddings improve masked image modeling

Authors: Amir Bar, Florian Bordes, Assaf Shocher, Mido Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann Lecun

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments and Results
Researcher Affiliation Collaboration 1Tel Aviv University 2UC Berkeley 3Meta AI (FAIR) 4Now also at Google Research 5New York University.
Pseudocode Yes Algorithm 1 MIM w/ Sto P pseudo-code. requires only a minor implementation change, highlighted in light gray.
Open Source Code Yes 1See https://github.com/amirbar/Sto P for code.
Open Datasets Yes Image Net (IN-1k) (Russakovsky et al., 2015), Places 205 (Zhou et al., 2014a), i Naturalist 2018 (Van Horn et al., 2018), and CIFAR 100 (Krizhevsky, 2009).
Dataset Splits No The paper states training on the full IN-1k dataset for a certain number of epochs and evaluating via linear probing on subsets like '1% of IN-1k', but it does not explicitly define standard train/validation/test splits with percentages or sample counts for the main model training process.
Hardware Specification Yes Here we pretrain all models for 300 epochs using 4 V100 nodes, on a total batch size of 2048.
Software Dependencies No The paper mentions 'optimizer Adam W' but does not provide specific version numbers for software libraries, programming languages, or other dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes The paper provides detailed pretraining settings in tables (e.g., Table 9, 10, 11, 12), including optimizer ('Adam W'), epochs (300/600), learning rate, weight decay, batch size (2048), learning rate schedule, warmup epochs, predictor depth, attention heads, embedding dimension, and the noise hyperparameter σ.