reproducibilityindex.ai

Scarf: Self-Supervised Contrastive Learning using Random Feature Corruption

Authors: Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose SCARF, a simple, widely-applicable technique for contrastive learning, where views are formed by corrupting a random subset of features. When applied to pre-train deep neural networks on the 69 real-world, tabular classiﬁcation datasets from the Open ML-CC18 benchmark, SCARF not only improves classiﬁcation accuracy in the fully-supervised setting but does so also in the presence of label noise and in the semi-supervised setting where only a fraction of the available training data is labeled. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders. We conduct comprehensive ablations, detailing the importance of a range of factors.
Researcher Affiliation	Industry	Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler Google Research {dbahri,heinrichj,yitay,metzler}@google.com
Pseudocode	Yes	Algorithm 1 SCARF pre-training algorithm.
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We use 69 datasets from the public Open ML-CC18 benchmark1 under the CC-BY licence. It consists of 72 real-world classiﬁcation datasets that have been manually curated for effective benchmarking.
Dataset Splits	Yes	For each Open ML dataset, we form 70%/10%/20% train/validation/test splits, where a different split is generated for every trial and all methods use the same splits.
Hardware Specification	No	Experiments were run on a cloud cluster of CPUs, and we used about one million CPU core hours in total for the experiments. (The description 'cloud cluster of CPUs' is too general and does not provide specific model numbers or detailed specifications.)
Software Dependencies	Yes	We use the Python API 2, version 0.6, and choose the default settings for XGBClassiﬁer (max depth of 3, 100 estimators, learning rate of 0.1).
Experiment Setup	Yes	We choose all three component models to be Re LU networks with hidden dimension 256. f consists of 4 layers, whereas both g and h have 2 layers. Both SCARF and the autoencoder baselines use g (for both pre-training and co-training, described later), but for autoencoders, the output dimensionality is the input feature dimensionality, and the mean-squared error reconstruction loss is applied. We train all models and their components with the Adam optimizer using the default learning rate of 0.001. For both pre-training and ﬁne-tuning we use 128 batch size. Unsupervised pre-training methods all use early stopping with patience 3 on the validation loss, unless otherwise noted. Supervised ﬁne-tuning uses this same criterion (and validation split), but classiﬁcation error is used as the validation metric for early stopping, as it performs slightly better. We set a max number of ﬁne-tune epochs of 200 and pre-train epochs of 1000, We use 10 epochs to build the static validation set. Unless otherwise noted, we use a corruption rate c of 0.6 and a temperature τ of 1, for SCARF-based methods. All runs are repeated 30 times using different train/validation/test splits.