Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner
Authors: Hanwen Zhong, Jiaxin Chen, Yutong Zhang, Di Huang, Yunhong Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on public benchmarks demonstrate the superiority of our method, compared to the state-of-the-art multi-task learning approaches. |
| Researcher Affiliation | Academia | Hanwen Zhong1,2 Jiaxin Chen1,2 Yutong Zhang1,2 Di Huang2 Yunhong Wang1,2 1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China 2School of Computer Science and Engineering, Beihang University, Beijing, China |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It includes mathematical equations and figures illustrating the framework but no code-like structured algorithms. |
| Open Source Code | Yes | The project page is available at https://github.com/Yewen1486/EMTAL. |
| Open Datasets | Yes | Multi-task FGVC is a collection of public datasets specifically for multi-task fine-grained visual classification, including CUB-2002011 [39], Stanford Cars [40], FGVC-Aircraft [41] and Oxford Flowers [42]. In addition, we conduct experiments on the Specialized VTAB-1k dataset [43] to validate the effectiveness over previous solutions. ... The train/val/test splits and the number of classes are summarized in Table 8. ... NYUv2 dataset [44]. |
| Dataset Splits | Yes | We follow the standard training/validation splits used in [43] for fair comparisons. ... we adopt the standard training/testing split as depicted in [6]. ... The train/val/test splits are the same as depicted in [6] (Table 8). |
| Hardware Specification | Yes | All experiments are conducted on a single Nvidia Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using 'Vi T-B/16 2 pre-trained on Image Net-21K' and the 'Adam W optimizer'. However, it does not provide specific version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other key libraries. |
| Experiment Setup | Yes | We use the Adam W optimizer [48] to fine-tune our models for 100 epochs and adopt the cosine learning rate decay with a linear warm-up for 10 epochs in all experiments. We fix the hyper-parameters τ in Eq. (8) to 5. As for data augmentation, we employ random resize cropping to 224 × 224 pixels and a random horizontal flip during training and resize to 248 × 248 pixels with a center crop to 224 × 224 pixels. We utilize Vi T-B/16 2 pre-trained on Image Net-21K [32] as the base model. ... We empirically study the effect of k by using 1, 4, 6, 64 and 192 clusters. |