Representation Learning by Detecting Incorrect Location Embeddings

Authors: Sepehr Sameni, Simon Jenni, Paolo Favaro

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the use of DILEMMA on several datasets, compare it to state-of-the-art (SotA) SSL baselines, and perform ablations to show the role of each loss component. In each table, where we compare to an SSL baseline, we indicate the baseline with a method name (e.g., Mo Co V3 (Chen, Xie, and He 2021)) and use a +{DILEMMA/sparsity} to indicate that the baseline immediately above is combined with just sparsity or with the DILEMMA loss, which includes sparsity. We compare these two cases to show the added benefit of the DILEMMA positional classification loss over the lone sparsity.
Researcher Affiliation Collaboration 1 Computer Vision Group, University of Bern, Switzerland 2 Adobe Research
Pseudocode No The paper describes the training process and losses using text and mathematical equations, but it does not include a formal pseudocode or algorithm block.
Open Source Code Yes 1source code: https://github.com/Separius/DILEMMA
Open Datasets Yes For our main model, we pre-train DILEMMA on Image Net-1K (Deng et al. 2009)... We evaluate DILEMMA on several datasets, compare it to state-of-the-art (SotA) SSL baselines, and perform ablations to show the role of each loss component... Aircraft (Maji et al. 2013), Caltech101 (Fei-Fei, Fergus, and Perona 2004), Cars (Krause et al. 2013), CIFAR10 (Krizhevsky 2009), CIFAR100 (Krizhevsky 2009), DTD (Cimpoi et al. 2014), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), INat19 (i Naturalist 2019), Pets (Parkhi et al. 2012), STL10 (Coates, Ng, and Lee 2011), SVHN (Netzer et al. 2011), and Yoga82 (Verma et al. 2020).
Dataset Splits Yes With reference to Image Net, we use the model pre-trained on the whole unlabeled dataset, train a linear layer on top of the frozen features of the 1% or 10% subsets (Chen et al. 2020a) and then evaluate the results on the whole validation set.
Hardware Specification Yes For our main model, we pre-train DILEMMA on Image Net-1K (Deng et al. 2009) with the exact same hyper-parameters of Mo Co V3 using three Ge Force RTX 3090 GPUs for 100 epochs... To show the efficiency of the proposed method, we ran Sim CLR, Mo Co V3, DINO with and without multi-crop on 4 GPUs and reported the epoch times in table 13.
Software Dependencies No The paper does not explicitly provide specific software versions for dependencies like Python, PyTorch, or other libraries. It mentions using 'Vision Transformers (Vi T)' and 'UPer Net' but without version numbers.
Experiment Setup Yes For our main model, we pre-train DILEMMA on Image Net-1K (Deng et al. 2009) with the exact same hyper-parameters of Mo Co V3 using three Ge Force RTX 3090 GPUs for 100 epochs with a base batch size of 345. We set λDILEMMA to 0.4 and the probability of positional embedding mismatch θ = 0.2. We use sparsity ratios of 0%, 40%, 55%, 65% with 1 , 2 , 3 , 4 base batch size and disable the DILEMMA loss when the input is dense.