Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning

Authors: Alex Tamkin, Margalit Glasgow, Xiluo He, Noah Goodman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform contrastive learning experiments on a range of image and audio datasets with multiple downstream tasks (e.g. synthetic datasets combining two classes, such as images and digits, and naturalistic datasets labeled with dozens of attributes). ... Finally, we formalize the intuition that feature dropout can aid learning with a theoretical analysis of a simple linear contrastive setting.
Researcher Affiliation Academia Alex Tamkin, Margalit Glasgow, Xiluo He, Noah Goodman Stanford University
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is open-sourced at https://github.com/xiluohe/feature-dropout.
Open Datasets Yes The three image datasets are based on the canonical CIFAR-10 image-recognition dataset [28] (MIT-License)... The CIFAR-10 images are overlaid with four copies of a randomly-sampled digit from the MNIST dataset... The audio dataset is created by overlaying the audio of a spoken digit (from the Audio MNIST dataset [3], MIT License)... To further validate the behavior of viewmaker on realistic multi-feature datasets, we consider the Celeb A [32] dataset...
Dataset Splits No The paper describes training epochs, batch sizes, and optimizers for pretraining and linear evaluation, but it does not explicitly specify the training, validation, and test dataset splits with percentages or counts for reproducing the data partitioning.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions software components like 'Sim CLR algorithm', 'Res Net-18 model', 'Spec Aug', 'Wave Aug', and 'SGD' but does not provide specific version numbers for any of these or other software dependencies.
Experiment Setup Yes We pretrain with the Sim CLR algorithm for 200 epochs with a batch size of 256 and a temperature of 0.1. ... We train for 100 epochs, using the same parameters as Tamkin et al. [49], using SGD with learning rate 0.01, momentum 0.9, weight decay 0, and batch size 128. ... we use a budget of ϵ = 0.05P for the image datasets, and ϵ = 0.125P for the audio datasets, where P is the number of pixels in the input.