Towards Understanding Why Mask Reconstruction Pretraining Helps in Downstream Tasks

Authors: Jiachun Pan, Pan Zhou, Shuicheng YAN

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results testify to our data assumptions and also our theoretical implications. (Abstract); Experimental results testify to our data assumptions and also our theoretical implications. (Introduction); Section 5: Experiments (Section title)
Researcher Affiliation Collaboration Jiachun Pan1,2 Pan Zhou1 Shuicheng Yan1 1 Sea AI Lab 2National University of Singapore pan.jiachun@u.nus.edu {zhoupan,yansc}@sea.com
Pseudocode No The paper describes mathematical derivations and processes (e.g., in Appendix E.1 'The derivation of above loss function is shown as follows.'), but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code No Footnotes 2, 3, 4 in Appendix B provide links to 'official trained SL model', 'official trained MAE model', and 'official trained data2vec model'. These are models/frameworks from other research groups that the authors used for comparison or as baselines, not the source code for their own proposed methodology. There is no statement from the authors about releasing their own code.
Open Datasets Yes We pretrain for 300 epochs on Image Net, and fine-tune pretrained Res Net50 for 100 epochs on Image Net. (Section 5); on Image Net (Deng et al., 2009) (Section 5); VOC07+12 (Table 1)
Dataset Splits No We pretrain for 300 epochs on Image Net, and fine-tune pretrained Res Net50 for 100 epochs on Image Net. (Section 5). The paper does not specify how the data was split into training, validation, and test sets, nor does it explicitly mention the use of a validation set for hyperparameter tuning or early stopping.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory specifications).
Software Dependencies No The paper mentions using 'Res Net50 (He et al., 2016b) trained by the Pytorch Team' and 'Vi T-base models (Dosovitskiy et al., 2020)', but it does not specify any software dependencies (e.g., libraries, frameworks) with their version numbers required to reproduce the experiments.
Experiment Setup No The paper mentions 'pretrain for 300 epochs on Image Net, and fine-tune pretrained Res Net50 for 100 epochs on Image Net.' (Section 5) and describes the masking process (e.g., 'randomly mask input patches', 'Pr(ϵi = 1) = θ'). It also states 'the learning rate η1 is often much smaller than η2 in practice.' (Section 3.2). However, concrete values for hyperparameters like specific learning rates, batch sizes, or optimizer details are not provided in the main text.