Data-efficient Large Vision Models through Sequential Autoregression
Authors: Zhiwei Hao, Jianyuan Guo, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluations underscore the model s agility in adapting to various tasks, heralding a significant reduction in the parameter footprint, and a marked decrease in training data requirements, thereby paving the way for more sustainable and accessible advancements in the field of generalist vision models. |
| Researcher Affiliation | Collaboration | Zhiwei Hao * 1 Jianyuan Guo * 2 Chengcheng Wang * 3 Yehui Tang 3 Han Wu 2 Han Hu B 1 Kai Han 3 Chang Xu B 2 haozhw@bit.edu.cn; {jianyuan.guo,han.wu}@sydney.edu.au; {chengcheng.wang,yehui.tang,kai.han}@huawei.com 1School of information and Electronics, Beijing Institute of Technology, Beijing, China 2School of Computer Science, Faculty of Engineering, University of Sydney, Sydney, Australia 3Huawei Noah s Ark Lab, Beijing, China. |
| Pseudocode | No | The paper describes its methods using prose and diagrams but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/ggjy/De LVM. |
| Open Datasets | Yes | Specifically, we enhance underrepresented (tail) datasets by randomly augmenting the training samples. We find that this approach yields stronger performance compared to traditional re-sampling strategy. ... we utilize various subsets of the SA-1B (Kirillov et al., 2023)... |
| Dataset Splits | Yes | Cross-entropy loss and perplexity on a withheld subset of SA-1B (equivalent to 1% of the dataset) serve as the metrics. To evaluate the performance of our trained model, we use a withheld subset of SA-1B, along with the MPII dataset (Andriluka et al., 2014) and the Test2800 dataset (Fu et al., 2017b). |
| Hardware Specification | Yes | Our training strategy adheres to the implementation of LVM (Bai et al., 2023), with slight adjustments made for efficient training with 8-16 A100 GPUs. |
| Software Dependencies | No | The paper states 'Our models are trained based on the Intern LM framework (Team, 2023)' but does not provide specific version numbers for this framework or any other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 8: Detailed configurations for training efficient LVMs. We attain a consistent equivalent batch size across different models by adjusting the number of employed GPUs, mini-batch size, and gradient accumulation steps. Config Value optimizer Adam W learning rate 1.5e-4 weight decay 0.1 optimizer momentum β1, β2=0.9, 0.95 equivalent batch size (tokens) 262144 learning rate schedule cosine warmup steps #total steps * 0.0056 final learning rate 1.5e-5 context length 2048 data augmentation Random Resized Crop |