reproducibilityindex.ai

Dissecting Query-Key Interaction in Vision Transformers

Authors: Xu Pan, Aaron Philip, Ziqian Xie, Odelia Schwartz

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We utilized a dataset that has been applied to studying visual salience [24], namely the Odd-One-Out (O3) dataset [29].
Researcher Affiliation	Academia	1University of Miami 2Harvard University 3Michigan State University 4University of Texas Health Science Center at Houston
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code for this work is available at: https://github.com/schwartz-cnl/Dissecting Vi T.
Open Datasets	Yes	We utilized a dataset that has been applied to studying visual salience [24], namely the Odd-One-Out (O3) dataset [29]. For each mode, we show the top 8 images in the Imagenet (Hugging Face version) [36] validation set that induce the largest attention score.
Dataset Splits	Yes	For each mode, we show the top 8 images in the Imagenet (Hugging Face version) [36] validation set that induce the largest attention score.
Hardware Specification	No	Our experiments do not require compute resources beyond a personal computer with a GPU.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	In this study, the "attention score" is defined as the dot product of every query and key pair, which has the shape of the number of tokens by the number of tokens and is defined per attention head. The "attention map" is the softmax of each query s attention score reshaped into a 2D image, which is defined per attention head and token.