Neural Harmonics: Bridging Spectral Embedding and Matrix Completion in Self-Supervised Learning

Authors: Marina Munkhoeva, Ivan Oseledets

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we verify that the performance of the proposed formulation in (4) and the corresponding loss function, denoted RQMIN , is at least on par with the state-of-the-art methods. We then study the effect of the complexity of the projection head on the incoherence and its connection with the downstream performance of the backbone against projection head outputs. Here we report the training hyperparameters for all of the experiments. As VICReg is extremely sensitive to the choice of hyperparameters (e.g. increasing learning rate with increased batch size negatively affects training learning diverges), we adopt the same hyperparameters for training RQMIN for a fair comparison. We follow the standard VICReg protocol adopted and finetuned for CIFAR-10/100 and Image Net-100 in the library for self-supervised learning methods for visual representation learning solo-learn [12]. We train Res Net-18 backbone architecture with 3-layer MLP projection head (respective hidden dimensions: 2048-2048-2048). The batch size is 256 for CIFAR datasets and 512 for Image Net-100. For pretraining, the learning rate schedule is linear warm-up for 10 epochs and cosine annealing, the optimizer is LARS with learning rate 0.3. For linear probe training, SGD with step learning rate schedule with steps at 60 and 80 epochs. The number of pre-training epochs is 1000 for CIFAR and 400 for Image Net-100, downstream training 100 epochs.
Researcher Affiliation Collaboration Correspondence to marina.munkhoeva@tuebingen.mpg.de Max Planck Institute for Intelligent Systems, Tübingen, Germany Artificial Intelligence Research Institute (AIRI), Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russian Federation
Pseudocode No The paper describes algorithms and formulations but does not present any formal pseudocode or algorithm blocks.
Open Source Code No The paper mentions using 'solo-learn' library [12] for visual representation learning, which is a third-party tool, but it does not state that the authors are releasing their own code for the methods described in this paper.
Open Datasets Yes We follow the standard VICReg protocol adopted and finetuned for CIFAR-10/100 and Image Net-100 in the library for self-supervised learning methods for visual representation learning solo-learn [12]. We train Res Net-18 backbone architecture... The batch size is 256 for CIFAR datasets and 512 for Image Net-100.
Dataset Splits Yes Mean and standard deviation for validation set accuracy across 5-10 runs for CIFAR-10, CIFAR-100 and Image Net-100... We embed the training set of Image Net-100 to get representations matrix A R125952 512 and compute incoherence µ(A) using effective rank re(A).
Hardware Specification No The paper mentions using a 'Res Net-18 backbone architecture' and discusses training on 'CIFAR datasets and Image Net-100', but it does not specify any hardware details such as GPU or CPU models used for the experiments.
Software Dependencies No The paper mentions using the 'solo-learn' library [12], but it does not provide specific version numbers for this or any other software dependencies.
Experiment Setup Yes We train Res Net-18 backbone architecture with 3-layer MLP projection head (respective hidden dimensions: 2048-2048-2048). The batch size is 256 for CIFAR datasets and 512 for Image Net-100. For pretraining, the learning rate schedule is linear warm-up for 10 epochs and cosine annealing, the optimizer is LARS with learning rate 0.3. For linear probe training, SGD with step learning rate schedule with steps at 60 and 80 epochs. The number of pre-training epochs is 1000 for CIFAR and 400 for Image Net-100, downstream training 100 epochs.