reproducibilityindex.ai

Transformer Doctor: Diagnosing and Treating Vision Transformers

Authors: Jiacong Hu, Hao Chen, Kejia Chen, Yang Gao, Jingwen Ye, Xingen Wang, Mingli Song, Zunlei Feng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a plethora of quantitative and qualitative experiments, it has been demonstrated that Transformer Doctor can effectively address internal errors in transformers, thereby enhancing model performance.
Researcher Affiliation	Collaboration	Jiacong Hu1,4, Hao Chen1, Kejia Chen2, Yang Gao6, Jingwen Ye3, Xingen Wang1,6, Mingli Song1,4,5, Zunlei Feng2,4,5 1College of Computer Science and Technology, Zhejiang University, 2School of Software Technology, Zhejiang University, 3Electrical and Computer Engineering, National University of Singapore, 4State Key Laboratory of Blockchain and Data Security, Zhejiang University, 5Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security, 6Bangsheng Technology Co., Ltd.
Pseudocode	No	The paper describes methods and formulas but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	For more information, please visit https://transformer-doctor.github.io/. Additionally, the algorithm code for the Transformer Doctor is included in the uploaded source_codes.zip file.
Open Datasets	Yes	To validate the effectiveness of Transformer Doctor, we conducted experiments on five mainstream datasets: CIFAR-10 [67], CIFAR-100 [67], Image Net-10 [68], Image Net-50 [69], and Image Net-1k [68].
Dataset Splits	No	The paper mentions training and testing but does not provide specific train/validation/test dataset splits (percentages or counts) needed for reproduction.
Hardware Specification	Yes	In the experiments, we utilized two Linux servers, each equipped with 8 NVIDIA A6000 GPU cards, 24 CPU cores, and 500GB of memory.
Software Dependencies	No	The paper mentions using the Adam W [70] optimizer but does not provide specific version numbers for other key software components or libraries.
Experiment Setup	Yes	During all training stage, each dataset was trained for 300 epochs using the Adam W [70] optimizer, with an initial learning rate of 0.01. The learning rate decayed according to a cosine annealing schedule, with T_max set to 300 epochs. Additionally, α and β were set to default values of 10 and 100, respectively, to balance each loss function. The default value of τ was 0.15, and the constrained loss function was applied by default to the last block.