Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Authors: Jiangning Zhang, Chao Xu, Jian Li, Wenzhou Chen, Yabiao Wang, Ying Tai, Shuo Chen, Chengjie Wang, Feiyue Huang, Yong Liu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach achieves state-of-the-art results on the Image Net classification task compared with recent vision transformer works while having smaller parameters and greater throughput. We further conduct multi-modal tasks to demonstrate the superiority of the unified EAT, e.g., Text-Based Image Retrieval, and our approach improves the rank-1 by +3.7 points over the baseline on the CSS dataset.3 |
| Researcher Affiliation | Collaboration | Jiangning Zhang1 Chao Xu1 Jian Li2 Wenzhou Chen1 Yabiao Wang2 Ying Tai2 Shuo Chen3 Chengjie Wang2 Feiyue Huang2 Yong Liu1 1APRIL Lab, Zhejiang University 2Youtu Lab, Tencent 3RIKEN Center for Advanced Intelligence Project |
| Pseudocode | No | The paper describes methods and equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Tencent Youtu Research/Base Architecture-EAT. |
| Open Datasets | Yes | The classification task is evaluated in Image Net-1k dataset [20], and we conduct all experiments on a single node with 8 V100 GPUs. TIR is conducted on Fashion200k [27], MIT-States [34], and CSS [71] datasets, while VLN is performed on the R2R navigation dataset [1]. |
| Dataset Splits | No | The paper mentions training, but does not explicitly state the training/validation/test dataset splits (e.g., 80/10/10 split or specific counts for each split). |
| Hardware Specification | Yes | The classification task is evaluated in Image Net-1k dataset [20], and we conduct all experiments on a single node with 8 V100 GPUs. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | By default, we train each model for 300 epoch from scratch without pre-training and distillation. The classification task is evaluated in Image Net-1k dataset [20]... Our method is trained in 224 resolution for 300 epochs without distillation... Table 4 shows ablation results for several items on Image Net-1k. Default represents the baseline method based on EAT-B. The gray font indicates that the corresponding parameter is not be modified. Ablation Items Head Layers Local Ratio FFN Ratio Top-1 ... Kernel Size ... Local Operator ... Image Size |