Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SSOLE: Rethinking Orthogonal Low-rank Embedding for Self-Supervised Learning
Authors: Lun Huang, Qiang Qiu, Guillermo Sapiro
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical analysis and empirical results demonstrate that these adaptations are crucial to SSOLE s effectiveness. Moreover, SSOLE achieves competitive performance across SSL benchmarks without relying on large batch sizes, memory banks, or dual-encoder architectures, making it an efficient and scalable solution for self-supervised tasks. Through theoretical analysis and empirical evaluations, we demonstrate that these adaptations are essential for SSOLE s success in SSL. |
| Researcher Affiliation | Collaboration | Lun Huang1,2 Qiang Qiu3 Guillermo Sapiro2,4 1Duke University 2Princeton University 3Purdue University 4Apple EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes the methodology and mathematical derivations in Sections 3 and 4, and provides implementation details in Appendix C. However, it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/husthuaan/ssole. |
| Open Datasets | Yes | We employ Res Net-18 (He et al., 2016) architecture is employed on the Image Net100 (Deng et al., 2009) dataset. Table 3: Comparative performance on Image Net100. Table 4: Performance on full Image Net of different methods. Table 5: Transfer Learning on object detection and instance segmentation on MS-COCO. Table 6: Transfer Learning on linear classification on various datasets. Method CIFAR10 CIFAR100 Aircraft DTD Flowers |
| Dataset Splits | Yes | Following the learning rate scaling strategy from Sw AV, we set different learning rates for the linear layers and the backbone network weights. Specifically, the linear layers learning rates are scaled up by 250 times and 20 times for the 1% and 10% tasks, respectively. We determined the optimal base learning rates for the linear layers to be 5.0 for the 1% task and 0.2 for the 10% task after conducting a search in the range of 0.01 to 10. These learning rates are then reduced by a factor of 0.2 at the 12th and 16th epochs during the training period. |
| Hardware Specification | No | Training utilizes a batch size of B = 128 across 4 GPUs, SGD optimizer with a base learning rate of lr = 2.0, and a cosine decay to 0.002. λ = 0.7. The experiment uses NL = 4 full views and NS = 4 small views with β = 0.6. For the full Image Net dataset, Res Net-50 is used as the backbone with an enhanced three-layer MLP (8192-d with Re LU and normalization) in the projector, leading to an embedding size of d = 8192. The batch size is set at B = 256, evenly distributed over 8 GPUs. |
| Software Dependencies | No | The paper mentions using specific models like ResNet-18 and ResNet-50, and optimizers like SGD. However, it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | Image Net100 Experiments: We employ Res Net-18 as the backbone (fθ) with a three-layer MLP (4096-d hidden layer with Re LU, followed by normalization) as a projector, yielding a final embedding dimension of d = 4096. Training utilizes a batch size of B = 128 across 4 GPUs, SGD optimizer with a base learning rate of lr = 2.0, and a cosine decay to 0.002. λ = 0.7. The experiment uses NL = 4 full views and NS = 4 small views with β = 0.6. Full Image Net (1K) Experiments: For the full Image Net dataset, Res Net-50 is used as the backbone with an enhanced three-layer MLP (8192-d with Re LU and normalization) in the projector, leading to an embedding size of d = 8192. The batch size is set at B = 256, evenly distributed over 8 GPUs. The SGD optimizer is used with a base learning rate of lr = 1.0, decaying to 0.001 following a cosine rule. λ = 0.7. The experiment uses NL = 4 full views and NS = 4 small views with β = 0.6. |