Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning

Authors: Johnathan Wenjia Xie, Yoonho Lee, Annie S Chen, Chelsea Finn

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate SMA on three self-supervised learning benchmarks in protein biology, chemical property prediction, and particle physics. We find SMA is capable of learning representations without domain-specific knowledge and achieves state-of-the-art performance on these three benchmarks.
Researcher Affiliation Academia Johnathan Xie Stanford University Yoonho Lee Stanford University Annie S. Chen Stanford University Chelsea Finn Stanford University
Pseudocode Yes Algorithm 1 SMA training Input: Data D = {X(1), . . .}, reconstruction loss L, initial parameters θ0, masking ratio r. while Not converged do X D A Attention(θ; X) Sec. 3.3 ˆA Apply Mask(A, r) (5) H Encode(θ; ˆA, X) Sec. 4.2 O Upsample(θ; H) (6) θ θ α θℓ(O, X) Sec. 4.2 Return θ
Open Source Code Yes 1We make the code available at this link.
Open Datasets Yes We follow the benchmark setting of TAPE (Rao et al., 2019) which involves pre-training over the Pfam training set for 10 epochs. and We compare SMA to the previous state-of-the-art works on Molecule Net regression and classification tasks (Wu et al., 2018) and We pre-train on the training set of the guacamol (Brown et al., 2019) dataset containing 1.6M molecules and We use the HIGGS (Whiteson, 2014) particle physics dataset and For natural language pre-training, we use a concatenation of English Wikipedia and Books Corpus and assess the representation quality by fine-tuning over the GLUE (Wang et al., 2018) benchmark. To assess image domain self-supervised learning, we pre-train all models on the Image Net-100 (Tian et al., 2020) subset.
Dataset Splits Yes We fine-tune over a selected subset of the training set of size 100,000, 10,000, and 1000 samples and report binary classification accuracy in Table 3a. and We use the cross validation splits from deepchem (Ramsundar et al., 2019) which uses scaffold splitting to test generalization across different molecule types.
Hardware Specification No The compute for this work was supported by a HAI Google cloud credit grant sponsored by Fei Fei Li.
Software Dependencies No For all experiments, we optimize using Adam Kingma and Ba (2014) with decoupled weight decay Loshchilov and Hutter (2017) and set β1 = 0.9, β2 = 0.999.
Experiment Setup Yes For all experiments, we provide further details on hyperparameters and model configurations in Section B of the appendix. and then specifically in Tables 8-12, e.g., config random guided weight decay 0.01 learning rate 1e-4 3e-4 epochs 10 batch size 256 masking rate 0.2 0.15 loss function crossentropy in Table 8b.