Representation Learning via Consistent Assignment of Views over Random Partitions

Authors: Thalles Santos Silva, Adín Ramírez Rivera

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through an extensive evaluation, we demonstrate that CARP s representations are suitable for learning downstream tasks. We evaluate CARP s representations capabilities in 17 datasets across many standard protocols, including linear evaluation, few-shot classification, k-NN, kmeans, image retrieval, and copy detection. We compare CARP performance to 11 existing self-supervised methods. We extensively ablate our method and demonstrate that our proposed random partition pretext task improves the quality of the learned representations by devising multiple random classification tasks.
Researcher Affiliation Academia Thalles Silva Institute of Computing University of Campinas thalles.silva@students.ic.unicamp.br Adín Ramírez Rivera Department of Informatics University of Oslo adinr@uio.no
Pseudocode Yes E Pseudocode of CARP in a Py Torch-like Style
Open Source Code Yes Code at https://sthalles.github.io/carp/.
Open Datasets Yes Table 2 reports clustering performance metrics of various clustering-based SSL methods on the Image Net-1M [36], CIFAR-10/100 [27], and the GTSRB [40] datasets.
Dataset Splits Yes For Image Net-1M evaluation, we trained a linear classifier on top of the frozen representations extracted from the last average pooling layer of the Res Net50 encoder for 100 epochs, following Zhou et al. s [49] protocol. ... We use the validation split to assess the quality of the learned prototypes.
Hardware Specification Yes For all experiments, we used 4 A100 40GB GPUs and gradient accumulation to simulate large batch sizes.
Software Dependencies No The paper mentions 'Py Torch style pseudo-code' and 'faiss library [25]', but does not specify version numbers for these or other software dependencies.
Experiment Setup Yes We train CARP on the Image Net-1M unlabeled dataset using Res Net50 [23] encoders. We take the output representation of the last global average pooling layer (a 2048-dim vector) and project it to a 256-dim vector. ... The hidden units of the projection head contain 2048 neurons. ... K = 65 536 prototypes. ... NP = 128, which creates subsets containing NB = 512 randomly chosen prototypes. ... CARP is pre-trained with the LARS [47] optimizer, end to end, with weight decay of 1 10 6. For models training up to 200 epochs, the learning rate starts from 0.6 and decays to 0.006 with a cosine scheduling [30] without warmups. For models pre-trained for more than 400 epochs, the learning rate starts at 0.3 and decays to 0.003 using the same cosine scheduler. We train the system with a global batch size of 4096 observations.