reproducibilityindex.ai

InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning

Authors: Ziheng Qin, Kai Wang, Zangwei Zheng, Jianyang Gu, Xiangyu Peng, xu Zhao Pan, Daquan Zhou, Lei Shang, Baigui Sun, Xuansong Xie, Yang You

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify the effectiveness of our method on multiple datasets: CIFAR-10/100 (Krizhevsky et al., a;b), Image Net-1K (Deng et al., 2009), ADE20K (Zhou et al., 2017) and FFHQ (Karras et al., 2019).Info Batch consistently obtains lossless training results on classification, semantic segmentation, vision pertaining, and instruction fine-tuning tasks.
Researcher Affiliation	Collaboration	1National University of Singapore 2Alibaba Group {zihengq, kai.wang, youy}@comp.nus.edu.sg
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is publicly available at NUS-HPC-AI-Lab/Info Batch.
Open Datasets	Yes	We verify the effectiveness of our method on multiple datasets: CIFAR-10/100 (Krizhevsky et al., a;b), Image Net-1K (Deng et al., 2009), ADE20K (Zhou et al., 2017) and FFHQ (Karras et al., 2019).
Dataset Splits	Yes	Image Net-1K is the subset of the Image Net-21k dataset with 1,000 categories. It contains 1,281,167 training images and 50,000 validation images.
Hardware Specification	Yes	Results are reported with Res Net-50 under 40% prune ratio for 90 epochs on an 8-A100GPU server. We use V100 for this experiment...
Software Dependencies	No	The paper states using "Py Torch (Paszke et al., 2019)", "Timm (Wightman et al., 2021)", and "mmsegmentation (Contributors, 2020)" but does not specify their version numbers.
Experiment Setup	Yes	For Info Batch, default value r = 0.5 and δ = 0.875 are used if not specified. For classification tasks... all models are trained with One Cycle scheduler (cosine annealing)... using default setting and SGD/LARS optimizer... with momentum 0.9, weight decay 5e-4. All images are augmented with commonly adopted transformations, i.e. normalization, random crop, and horizontal flop... and "LARS use a max learning rate 2.3 for the One Cycle scheduler under the batch size of 128, and a maximum learning rate of 5.62 for a batch size of 256."