Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CG-SSL: Concept-Guided Self-Supervised Learning

Authors: Sara Atito, Josef Kittler, Imran Razzak, Muhammad Awais

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effectiveness of CG-SSL framework through a series of experiments. We begin by outlining our experimental setup, including datasets, architectures, and implementation details. Next, we report quantitative and qualitative results on tasks involving whole-image understanding and dense prediction. Finally, we conduct ablation studies to validate key components of our method.
Researcher Affiliation Academia Sara Atito 1,2 Josef Kittler2 Imran Razzak3 Muhammad Awais1,2 1 Surrey Institute for People-Centred AI, University of Surrey, UK 2Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, UK 3 Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, UAE
Pseudocode No The paper describes methods and algorithms using mathematical formulas and descriptive text, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Code and pretrained models will be released. Code and pretrained models will be made publicly available to facilitate reproducibility and ease of use. Additionally, code will be submitted with the supplementary materials.
Open Datasets Yes Pretraining Dataset: Our main models are pretrained on Image Net-1K without labels. For ablation studies, we pretrain on a combination of dense datasets, namely PASCAL VOC [50], Visual Genome [51], and MS-COCO [52], which together provide approximately 170K diverse images.
Dataset Splits Yes We use standard benchmarks with pre-defined train/val/test splits.
Hardware Specification Yes All models are trained using the Adam W optimiser with a cosine learning rate schedule and an effective batch size of 256 distributed across 8 GPUs. We train Vi T-S for 800 epochs, Vi T-B for 500 epochs, and Vi T-L for 300 epochs. Further training and ablation details are included in the Appendix. CG-SSL takes approximately 24 minutes per epoch to pre-train a Vi T-B/16 model using 8 GPUs with an effective batch size of 256.
Software Dependencies No Our code builds upon publicly available repositories licensed under CC-BY 4.0. All such sources are clearly credited within the codebase wherever they are used, in accordance with the license terms.
Experiment Setup Yes Implementation Details: We use Vi T backbone with patch size 16 16, following standard Vi T-S/16, Vi T-B/16, and Vi T-L/16. We use N = 4 concept tokens and the clustering module C consists of L = 4 transformer decoder blocks. The output of C, along with the encoder s [CLS] and [patch] tokens, are passed through a shared projection head comprising two linear layers with 2048 units and GELU activations, followed by a 256-dimensional bottleneck. The output is L2-normalised and projected into an 8192-dimensional embedding space. All models are trained using the Adam W optimiser with a cosine learning rate schedule and an effective batch size of 256 distributed across 8 GPUs. We train Vi T-S for 800 epochs, Vi T-B for 500 epochs, and Vi T-L for 300 epochs.