InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
Authors: Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, xu Zhao Pan, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify the effectiveness of our method on multiple datasets: CIFAR-10/100 (Krizhevsky et al., a;b), Image Net-1K (Deng et al., 2009), ADE20K (Zhou et al., 2017) and FFHQ (Karras et al., 2019).Info Batch consistently obtains lossless training results on classification, semantic segmentation, vision pertaining, and instruction fine-tuning tasks. |
| Researcher Affiliation | Collaboration | 1National University of Singapore 2Alibaba Group {zihengq, kai.wang, youy}@comp.nus.edu.sg |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is publicly available at NUS-HPC-AI-Lab/Info Batch. |
| Open Datasets | Yes | We verify the effectiveness of our method on multiple datasets: CIFAR-10/100 (Krizhevsky et al., a;b), Image Net-1K (Deng et al., 2009), ADE20K (Zhou et al., 2017) and FFHQ (Karras et al., 2019). |
| Dataset Splits | Yes | Image Net-1K is the subset of the Image Net-21k dataset with 1,000 categories. It contains 1,281,167 training images and 50,000 validation images. |
| Hardware Specification | Yes | Results are reported with Res Net-50 under 40% prune ratio for 90 epochs on an 8-A100GPU server. We use V100 for this experiment... |
| Software Dependencies | No | The paper states using "Py Torch (Paszke et al., 2019)", "Timm (Wightman et al., 2021)", and "mmsegmentation (Contributors, 2020)" but does not specify their version numbers. |
| Experiment Setup | Yes | For Info Batch, default value r = 0.5 and δ = 0.875 are used if not specified. For classification tasks... all models are trained with One Cycle scheduler (cosine annealing)... using default setting and SGD/LARS optimizer... with momentum 0.9, weight decay 5e-4. All images are augmented with commonly adopted transformations, i.e. normalization, random crop, and horizontal flop... and "LARS use a max learning rate 2.3 for the One Cycle scheduler under the batch size of 128, and a maximum learning rate of 5.62 for a batch size of 256." |