Can Vision Transformers Learn without Natural Images?

Authors: Kodai Nakashima, Hirokatsu Kataoka, Asato Matsumoto, Kenji Iwata, Nakamasa Inoue, Yutaka Satoh1990-1998

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we experimentally verify that the results of formula-driven supervised learning (FDSL) framework are comparable with, and can even partially outperform, sophisticated self-supervised learning (SSL) methods like Sim CLRv2 and Mo Cov2 without using any natural images in the pre-training phase.
Researcher Affiliation Academia 1National Institute of Advanced Industrial Science and Technology (AIST), Japan 2University of Tsukuba, Japan 3Tokyo Institute of Technology, Japan {nakashima.kodai, hirokatsu.kataoka, matsumoto-a, kenji.iwata, yu.satou}@aist.go.jp, inoue@c.titech.ac.jp
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes In experiments using the CIFAR-10 dataset, we show that our model achieved a performance rate of 97.8, which is comparable to the rate of 97.4 achieved with Sim CLRv2 and 98.0 achieved with Image Net. We evaluate the CIFAR-10/100 (C10/C100), Stanford Cars (Cars), and Flowers-102 (Flowers) datasets. We also evaluate the models on IN100, P30, and Pascal VOC 2012 (VOC12).
Dataset Splits No The paper mentions 'Val. accuracy transition during fine-tuning on CIFAR-10' in Figure 1, implying a validation set was used, but it does not provide specific details on the split percentages or sample counts for validation data.
Hardware Specification No The paper states: 'Computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST) was used.' However, it does not specify any particular hardware components such as GPU models, CPU types, or memory amounts.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We evaluate up to 300 epochs to determine if they provide additional improvements to Fractal DB pre-training. We explore an effective Fractal DB configuration for Vi T using the process described in (Kataoka et al. 2020). We began by carrying out experiments related to the FDSL-family (Fractal DB, Bezier Curve DB, and Perlin Noise DB; see Table 1), architectures (Vi T, g MLP, and CNN; see Table 2), and #category/#instance (see Figure 4). Grayscale vs. color. Number of transformations. Range of transformation parameters. Training epoch.