reproducibilityindex.ai

Transformer in Transformer

Authors: Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing XU, Yunhe Wang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several benchmarks demonstrate the effectiveness of the proposed TNT architecture, e.g., we achieve an 81.5% top-1 accuracy on the Image Net, which is about 1.7% higher than that of the state-of-the-art visual transformer with similar computational cost.
Researcher Affiliation	Collaboration	Kai Han1,2 An Xiao2 Enhua Wu1,3 Jianyuan Guo2 Chunjing Xu2 Yunhe Wang2 1State Key Lab of Computer Science, ISCAS & UCAS 2Huawei Noah s Ark Lab 3University of Macau
Pseudocode	No	The paper provides architectural descriptions, mathematical formulas for components like MSA, MLP, and LN, and an illustration in Figure 1, but it does not contain a dedicated pseudocode or algorithm block.
Open Source Code	Yes	The Py Torch code is available at https://github.com/huawei-noah/CV-Backbones, and the Mind Spore code is available at https://gitee.com/mindspore/models/ tree/master/research/cv/TNT.
Open Datasets	Yes	Image Net ILSVRC 2012 [26] is an image classiﬁcation benchmark consisting of 1.2M training images belonging to 1000 classes, and 50K validation images with 50 images per class. ... The details of used visual datasets are listed in Table 2. ... For the license of Image Net dataset, please refer to http://www.image-net.org/download. ... For the licenses of these datasets, please refer to the original papers.
Dataset Splits	Yes	Image Net ILSVRC 2012 [26] is an image classiﬁcation benchmark consisting of 1.2M training images belonging to 1000 classes, and 50K validation images with 50 images per class. ... The details of used visual datasets are listed in Table 2.
Hardware Specification	Yes	All the models are implemented with Py Torch [24] and Mind Spore [15] and trained on NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The paper mentions software like Py Torch [24] and Mind Spore [15] but does not specify their version numbers or versions for any other key libraries or dependencies.
Experiment Setup	Yes	We utilize the training strategy provided in Dei T [31]. The main advanced technologies apart from common settings [12] include Adam W [20], label smoothing [27], Drop Path [18], and repeated augmentation [14]. We list the hyper-parameters in Table 3 for better understanding. ... Table 3: Default training hyper-parameters used in our method, unless stated otherwise. Epochs Optimizer Batch Learning LR Weight Warmup Label Drop Repeated size rate decay decay epochs smooth path Aug 300 Adam W 1024 1e-3 cosine 0.05 5 0.1 0.1