MAST: Masked Augmentation Subspace Training for Generalizable Self-Supervised Priors

Authors: Chen Huang, Hanlin Goh, Jiatao Gu, Joshua M. Susskind

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that MAST consistently improves generalization on various downstream tasks, while being task-agnostic and efficient during SSL.4 EXPERIMENTS Main results. We start with evaluating our MAST method for SSL on Image Net. Table 1 shows the results from both linear and semi-supervised evaluations, using the same optimization procedures of our baseline VICReg (Bardes et al., 2022).
Researcher Affiliation Industry Chen Huang, Hanlin Goh, Jiatao Gu & Josh Susskind Apple Inc. {chen-huang,hanlin,jgu32,jsusskind}@apple.com
Pseudocode No The paper describes the method using prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Our pretraining is mostly performed on the unlabeled Image Net dataset (Deng et al., 2009)... The same pretraining protocol is followed: pretraining Res Net-18 for 200 epochs on STL-10 dataset (Coates et al., 2011)...
Dataset Splits No The paper mentions using '1% and 10% Image Net samples' for semi-supervised classification and 'training set' for linear classification, but it does not provide explicit percentages for training, validation, and test splits for the main experiments, particularly for a separate validation set.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions the use of 'LARS optimizer' but does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks.
Experiment Setup Yes Loss coefficients in Eq. (7) are set as α = 25, β = 1 following VICReg (Bardes et al., 2022), and we set λ = 25d/K, λ1 = 600/(d K) and λ2 = 25 to generate comparable loss magnitudes. Training details: The training protocol follows that in (Bardes et al., 2022): with batch size 2048, the LARS optimizer (You et al., 2017) runs for 1000 epochs with weight decay 10 6 and learning rate 1.6. The learning rate follows a cosine annealing schedule (Loshchilov & Hutter, 2017).