Curved Representation Space of Vision Transformers

Authors: Juyeop Kim, Junha Park, Songkuk Kim, Jong-Seok Lee

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We resolve this contradiction by empirically investigating how the output of the penultimate layer moves in the representation space as the input data moves linearly within a small area. In particular, we show the following. (1) While CNNs exhibit fairly linear relationship between the input and output movements, Transformers show nonlinear relationship for some data. ... We empirically show that this curved representation space results in the aforementioned contradiction.
Researcher Affiliation Academia Juyeop Kim, Junha Park, Songkuk Kim*, Jong-Seok Lee* School of Integrated Technology / BK21 Graduate Program in Intelligent Semiconductor Technology Yonsei University, Korea {juyeopkim, junha.park, songkuk, jong-seok.lee}@yonsei.ac.kr
Pseudocode Yes Refer to Algorithm 1 in Appendix for the detailed procedure to solve the optimization problem in Eq. 3.
Open Source Code No The paper provides a link to its arXiv preprint (https://arxiv.org/abs/2210.05742) which is the paper itself, but it does not contain an explicit statement or a specific link providing access to the source code for the methodology described.
Open Datasets Yes An image data from Image Net (Russakovsky et al. 2015)..." and "Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; ; and Fei-Fei, L. 2015. Image Net large scale visual recognition challenge. IJCV, 115(3): 211 252.
Dataset Splits Yes We compare the calibration of CNNs... and Transformers... on the Image Net validation set" and "Fig. 3 shows the obtained values of ϵ with respect to the confidence values for the Image Net validation data". The ImageNet validation set is a commonly used predefined split.
Hardware Specification No The paper discusses various models, datasets, and experimental procedures, but it does not specify any hardware details such as GPU models, CPU types, or other computing resources used to run the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes Figs. 5a-5d show the direction changes in travel for Res Net50, Mobile Net V2, Vi T-B/16 and Swin-T when d = d FGSM, ϵ = .05, and N = 50. ... We set the maximum amount of perturbation to ϵIFGSM=.001 or .002, the number of iterations to T=10, and the step size to ϵIFGSM/T.