Neural Harmonics: Bridging Spectral Embedding and Matrix Completion in Self-Supervised Learning
Authors: Marina Munkhoeva, Ivan Oseledets
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we verify that the performance of the proposed formulation in (4) and the corresponding loss function, denoted RQMIN , is at least on par with the state-of-the-art methods. We then study the effect of the complexity of the projection head on the incoherence and its connection with the downstream performance of the backbone against projection head outputs. Here we report the training hyperparameters for all of the experiments. As VICReg is extremely sensitive to the choice of hyperparameters (e.g. increasing learning rate with increased batch size negatively affects training learning diverges), we adopt the same hyperparameters for training RQMIN for a fair comparison. We follow the standard VICReg protocol adopted and finetuned for CIFAR-10/100 and Image Net-100 in the library for self-supervised learning methods for visual representation learning solo-learn [12]. We train Res Net-18 backbone architecture with 3-layer MLP projection head (respective hidden dimensions: 2048-2048-2048). The batch size is 256 for CIFAR datasets and 512 for Image Net-100. For pretraining, the learning rate schedule is linear warm-up for 10 epochs and cosine annealing, the optimizer is LARS with learning rate 0.3. For linear probe training, SGD with step learning rate schedule with steps at 60 and 80 epochs. The number of pre-training epochs is 1000 for CIFAR and 400 for Image Net-100, downstream training 100 epochs. |
| Researcher Affiliation | Collaboration | Correspondence to marina.munkhoeva@tuebingen.mpg.de Max Planck Institute for Intelligent Systems, Tübingen, Germany Artificial Intelligence Research Institute (AIRI), Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russian Federation |
| Pseudocode | No | The paper describes algorithms and formulations but does not present any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using 'solo-learn' library [12] for visual representation learning, which is a third-party tool, but it does not state that the authors are releasing their own code for the methods described in this paper. |
| Open Datasets | Yes | We follow the standard VICReg protocol adopted and finetuned for CIFAR-10/100 and Image Net-100 in the library for self-supervised learning methods for visual representation learning solo-learn [12]. We train Res Net-18 backbone architecture... The batch size is 256 for CIFAR datasets and 512 for Image Net-100. |
| Dataset Splits | Yes | Mean and standard deviation for validation set accuracy across 5-10 runs for CIFAR-10, CIFAR-100 and Image Net-100... We embed the training set of Image Net-100 to get representations matrix A R125952 512 and compute incoherence µ(A) using effective rank re(A). |
| Hardware Specification | No | The paper mentions using a 'Res Net-18 backbone architecture' and discusses training on 'CIFAR datasets and Image Net-100', but it does not specify any hardware details such as GPU or CPU models used for the experiments. |
| Software Dependencies | No | The paper mentions using the 'solo-learn' library [12], but it does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | We train Res Net-18 backbone architecture with 3-layer MLP projection head (respective hidden dimensions: 2048-2048-2048). The batch size is 256 for CIFAR datasets and 512 for Image Net-100. For pretraining, the learning rate schedule is linear warm-up for 10 epochs and cosine annealing, the optimizer is LARS with learning rate 0.3. For linear probe training, SGD with step learning rate schedule with steps at 60 and 80 epochs. The number of pre-training epochs is 1000 for CIFAR and 400 for Image Net-100, downstream training 100 epochs. |