reproducibilityindex.ai

Data-efficient Large Vision Models through Sequential Autoregression

Authors: Zhiwei Hao, Jianyuan Guo, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluations underscore the model s agility in adapting to various tasks, heralding a significant reduction in the parameter footprint, and a marked decrease in training data requirements, thereby paving the way for more sustainable and accessible advancements in the field of generalist vision models.
Researcher Affiliation	Collaboration	Zhiwei Hao * 1 Jianyuan Guo * 2 Chengcheng Wang * 3 Yehui Tang 3 Han Wu 2 Han Hu B 1 Kai Han 3 Chang Xu B 2 haozhw@bit.edu.cn; {jianyuan.guo,han.wu}@sydney.edu.au; {chengcheng.wang,yehui.tang,kai.han}@huawei.com 1School of information and Electronics, Beijing Institute of Technology, Beijing, China 2School of Computer Science, Faculty of Engineering, University of Sydney, Sydney, Australia 3Huawei Noah s Ark Lab, Beijing, China.
Pseudocode	No	The paper describes its methods using prose and diagrams but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/ggjy/De LVM.
Open Datasets	Yes	Specifically, we enhance underrepresented (tail) datasets by randomly augmenting the training samples. We find that this approach yields stronger performance compared to traditional re-sampling strategy. ... we utilize various subsets of the SA-1B (Kirillov et al., 2023)...
Dataset Splits	Yes	Cross-entropy loss and perplexity on a withheld subset of SA-1B (equivalent to 1% of the dataset) serve as the metrics. To evaluate the performance of our trained model, we use a withheld subset of SA-1B, along with the MPII dataset (Andriluka et al., 2014) and the Test2800 dataset (Fu et al., 2017b).
Hardware Specification	Yes	Our training strategy adheres to the implementation of LVM (Bai et al., 2023), with slight adjustments made for efficient training with 8-16 A100 GPUs.
Software Dependencies	No	The paper states 'Our models are trained based on the Intern LM framework (Team, 2023)' but does not provide specific version numbers for this framework or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Table 8: Detailed configurations for training efficient LVMs. We attain a consistent equivalent batch size across different models by adjusting the number of employed GPUs, mini-batch size, and gradient accumulation steps. Config Value optimizer Adam W learning rate 1.5e-4 weight decay 0.1 optimizer momentum β1, β2=0.9, 0.95 equivalent batch size (tokens) 262144 learning rate schedule cosine warmup steps #total steps * 0.0056 final learning rate 1.5e-5 context length 2048 data augmentation Random Resized Crop