FLSL: Feature-level Self-supervised Learning

Authors: Qing Su, Anton Netchaev, Hai Li, Shihao Ji

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of FLSL by conducting extensive experiments. Specifically, we compare FLSL to existing SSL approaches on multiple dense prediction benchmarks: (i) MS-COCO [42] object detection and instance segmentation, (ii) UAVDT [23] object detection from UAV platforms, and (iii) DAVIS video instance segmentation [46]. Moreover, we investigate the properties of FLSL features in terms of semantic alignment and feature separability in the embedding space.
Researcher Affiliation Academia Qing Su1 , Anton Netchaev2, Hai Li3, and Shihao Ji1 1Georgia State University, 2U.S. Army ERDC, 3Duke University
Pseudocode Yes Pseudo-code, training details, and settings of augmentation pipeline are provided in Appendix E.
Open Source Code Yes The source code is available at https://github.com/ISL-CV/FLSL.
Open Datasets Yes We compare FLSL to existing SSL approaches on multiple dense prediction benchmarks: (i) MS-COCO [42] object detection and instance segmentation, (ii) UAVDT [23] object detection from UAV platforms, and (iii) DAVIS video instance segmentation [46]... Models are pretrained on Image Net-1k [52] dataset using Adam W optimizer [45] with a batch size of 512.
Dataset Splits No The paper mentions using standard schedules and training recipes but does not provide explicit training, validation, and test dataset splits with percentages, sample counts, or specific predefined split citations.
Hardware Specification Yes All our experiments are performed on Nvidia RTX A6000.
Software Dependencies No The paper mentions using the Adam W optimizer and following DeiT for ViT implementation, but it does not provide specific version numbers for software components or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The coefficients of Eq. 13 in our experiments are υ = .03, = 1 and γ = 5 unless stated otherwise. We assume a uniform prior, i.e., k = 1/K, ∀k. Models are pretrained on Image Net-1k [52] dataset using Adam W optimizer [45] with a batch size of 512. All Vi T models are pretrained for 300 epochs as in most baselines for a fair comparison.