reproducibilityindex.ai

Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learner

Authors: Hanwen Zhong, Jiaxin Chen, Yutong Zhang, Di Huang, Yunhong Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on public benchmarks demonstrate the superiority of our method, compared to the state-of-the-art multi-task learning approaches.
Researcher Affiliation	Academia	Hanwen Zhong1,2 Jiaxin Chen1,2 Yutong Zhang1,2 Di Huang2 Yunhong Wang1,2 1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, China 2School of Computer Science and Engineering, Beihang University, Beijing, China
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It includes mathematical equations and figures illustrating the framework but no code-like structured algorithms.
Open Source Code	Yes	The project page is available at https://github.com/Yewen1486/EMTAL.
Open Datasets	Yes	Multi-task FGVC is a collection of public datasets specifically for multi-task fine-grained visual classification, including CUB-2002011 [39], Stanford Cars [40], FGVC-Aircraft [41] and Oxford Flowers [42]. In addition, we conduct experiments on the Specialized VTAB-1k dataset [43] to validate the effectiveness over previous solutions. ... The train/val/test splits and the number of classes are summarized in Table 8. ... NYUv2 dataset [44].
Dataset Splits	Yes	We follow the standard training/validation splits used in [43] for fair comparisons. ... we adopt the standard training/testing split as depicted in [6]. ... The train/val/test splits are the same as depicted in [6] (Table 8).
Hardware Specification	Yes	All experiments are conducted on a single Nvidia Ge Force RTX 3090 GPU.
Software Dependencies	No	The paper mentions using 'Vi T-B/16 2 pre-trained on Image Net-21K' and the 'Adam W optimizer'. However, it does not provide specific version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other key libraries.
Experiment Setup	Yes	We use the Adam W optimizer [48] to fine-tune our models for 100 epochs and adopt the cosine learning rate decay with a linear warm-up for 10 epochs in all experiments. We fix the hyper-parameters τ in Eq. (8) to 5. As for data augmentation, we employ random resize cropping to 224 × 224 pixels and a random horizontal flip during training and resize to 248 × 248 pixels with a center crop to 224 × 224 pixels. We utilize Vi T-B/16 2 pre-trained on Image Net-21K [32] as the base model. ... We empirically study the effect of k by using 1, 4, 6, 64 and 192 clusters.