reproducibilityindex.ai

Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model

Authors: Jiangning Zhang, Chao Xu, Jian Li, Wenzhou Chen, Yabiao Wang, Ying Tai, Shuo Chen, Chengjie Wang, Feiyue Huang, Yong Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach achieves state-of-the-art results on the Image Net classiﬁcation task compared with recent vision transformer works while having smaller parameters and greater throughput. We further conduct multi-modal tasks to demonstrate the superiority of the uniﬁed EAT, e.g., Text-Based Image Retrieval, and our approach improves the rank-1 by +3.7 points over the baseline on the CSS dataset.3
Researcher Affiliation	Collaboration	Jiangning Zhang1 Chao Xu1 Jian Li2 Wenzhou Chen1 Yabiao Wang2 Ying Tai2 Shuo Chen3 Chengjie Wang2 Feiyue Huang2 Yong Liu1 1APRIL Lab, Zhejiang University 2Youtu Lab, Tencent 3RIKEN Center for Advanced Intelligence Project
Pseudocode	No	The paper describes methods and equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Tencent Youtu Research/Base Architecture-EAT.
Open Datasets	Yes	The classiﬁcation task is evaluated in Image Net-1k dataset [20], and we conduct all experiments on a single node with 8 V100 GPUs. TIR is conducted on Fashion200k [27], MIT-States [34], and CSS [71] datasets, while VLN is performed on the R2R navigation dataset [1].
Dataset Splits	No	The paper mentions training, but does not explicitly state the training/validation/test dataset splits (e.g., 80/10/10 split or specific counts for each split).
Hardware Specification	Yes	The classiﬁcation task is evaluated in Image Net-1k dataset [20], and we conduct all experiments on a single node with 8 V100 GPUs.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	By default, we train each model for 300 epoch from scratch without pre-training and distillation. The classiﬁcation task is evaluated in Image Net-1k dataset [20]... Our method is trained in 224 resolution for 300 epochs without distillation... Table 4 shows ablation results for several items on Image Net-1k. Default represents the baseline method based on EAT-B. The gray font indicates that the corresponding parameter is not be modiﬁed. Ablation Items Head Layers Local Ratio FFN Ratio Top-1 ... Kernel Size ... Local Operator ... Image Size