Temperature Schedules for self-supervised contrastive methods on long-tail data
Authors: Anna Kukleva, Moritz Böhle, Bernt Schiele, Hilde Kuehne, Christian Rupprecht
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate our hypothesis that simple manipulations of the temperature parameter in Eq. (1) lead to better performance for long-tailed data. First, we introduce our experimental setup in Sec. 4.1, then in Sec. 4.2 we discuss the results across three imbalanced datasets and, finally, we analyse different design choices of the framework through extensive ablation studies in Sec. 4.3. |
| Researcher Affiliation | Collaboration | 1 MPI for Informatics, Saarland Informatics Campus, 2 Goethe University Frankfurt, 3 MIT-IBM Watson AI Lab,4 University of Oxford |
| Pseudocode | Yes | Algorithm 1 Cosine Schedule Require: period T 0, τ = 0.1, τ+ = 1.0 ep current epoch tau (τ+ τ ) (1+np.cos(2 np.pi ep/T))/2 + τ Insert algorithm 1 into your favourite contrastive learning framework to check it out! |
| Open Source Code | Yes | Code available at: github.com/annusha/temperatureschedules |
| Open Datasets | Yes | We consider long-tailed (LT) versions of the following three popular datasets for the experiments: CIFAR10-LT, CIFAR100-LT, and Image Net100-LT. For most of the experiments, we follow the setting from SDCLR (Jiang et al., 2021). In case of CIFAR10-LT/CIFAR100-LT, the original datasets (Krizhevsky et al., 2009) consist of 60000 32x32 images sampled uniformly from 10 and 100 semantic classes, respectively, where 50000 images correspond to the training set and 10000 to a test set. Long-tail versions of the datasets are introduced by Cui et al. (2019) and consist of a subset of the original datasets with an exponential decay in the number of images per class. Image Net100-LT is a subset of the original Image Net-100 (Tian et al., 2020a) consisting of 100 classes for a total of 12.21k 256x256 images. |
| Dataset Splits | Yes | Following Jiang et al. (2021), we separate 5000 images for CIFAR10/100-LT as a validation set for each split. |
| Hardware Specification | No | No specific hardware details such as GPU models (e.g., NVIDIA A100, Tesla V100), CPU models, or cloud instances are mentioned for running the experiments. |
| Software Dependencies | No | The paper mentions several components like SGD optimizer, ResNet, MoCo, SimCLR, and implies the use of libraries (e.g., np.cos for NumPy), but does not provide specific version numbers for any software dependencies (e.g., "PyTorch 1.9" or "Python 3.8"). |
| Experiment Setup | Yes | We use an SGD optimizer for all experiments with a weight decay of 1e-4. As for the learning rate, we utilize linear warm-up for 10 epochs that is followed by a cosine annealing schedule starting from 0.5. We train for 2000 epochs for CIFAR10-LT and CIFAR100-LT and 800 epochs for Image Net100-LT. For CIFAR10-LT and CIFAR100-LT we use a Res Net18 (He et al., 2016) backbone. Sim CLR details: we train with a batch size of 512 and a projection head that has two layers with an output size of 128. Regarding the proposed temperature schedules (TS), we use a period length of T =400 with τ+=1.0 and τ =0.1 if not stated otherwise. |