MAST: Masked Augmentation Subspace Training for Generalizable Self-Supervised Priors
Authors: Chen Huang, Hanlin Goh, Jiatao Gu, Joshua M. Susskind
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that MAST consistently improves generalization on various downstream tasks, while being task-agnostic and efficient during SSL.4 EXPERIMENTS Main results. We start with evaluating our MAST method for SSL on Image Net. Table 1 shows the results from both linear and semi-supervised evaluations, using the same optimization procedures of our baseline VICReg (Bardes et al., 2022). |
| Researcher Affiliation | Industry | Chen Huang, Hanlin Goh, Jiatao Gu & Josh Susskind Apple Inc. {chen-huang,hanlin,jgu32,jsusskind}@apple.com |
| Pseudocode | No | The paper describes the method using prose and mathematical equations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Our pretraining is mostly performed on the unlabeled Image Net dataset (Deng et al., 2009)... The same pretraining protocol is followed: pretraining Res Net-18 for 200 epochs on STL-10 dataset (Coates et al., 2011)... |
| Dataset Splits | No | The paper mentions using '1% and 10% Image Net samples' for semi-supervised classification and 'training set' for linear classification, but it does not provide explicit percentages for training, validation, and test splits for the main experiments, particularly for a separate validation set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of 'LARS optimizer' but does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | Loss coefficients in Eq. (7) are set as α = 25, β = 1 following VICReg (Bardes et al., 2022), and we set λ = 25d/K, λ1 = 600/(d K) and λ2 = 25 to generate comparable loss magnitudes. Training details: The training protocol follows that in (Bardes et al., 2022): with batch size 2048, the LARS optimizer (You et al., 2017) runs for 1000 epochs with weight decay 10 6 and learning rate 1.6. The learning rate follows a cosine annealing schedule (Loshchilov & Hutter, 2017). |