DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Authors: Joya Chen, Kai Xu, Yuhui Wang, Yifei Cheng, Angela Yao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that we can drop up to 90% of the intermediate tensor elements in fullyconnected and convolutional layers while achieving higher testing accuracy for Visual Transformers and Convolutional Neural Networks on various tasks (e.g. , classification, object detection, instance segmentation). Our code and models are available at https://github.com/chenjoya/dropit.
Researcher Affiliation Academia Joya Chen1 , Kai Xu1 , Yuhui Wang1, Yifei Cheng2, Angela Yao1 1National University of Singapore 2University of Science and Technology of China
Pseudocode No The paper describes the method and uses figures for illustration, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Our code and models are available at https://github.com/chenjoya/dropit.
Open Datasets Yes In this section, we present a comprehensive evaluation of Drop IT s effectiveness, leveraging experiments on training from scratch on Image Net-1k (Russakovsky et al., 2015). Our results demonstrate that Drop IT outperforms existing methods by achieving lower training loss, higher testing accuracy, and reduced GPU memory consumption. We showcase the versatility of Drop IT in various finetuning scenarios, such as Image Net-1k to CIFAR-100 (Krizhevsky et al., 2009), object detection, and instance segmentation on MS-COCO (Lin et al., 2014).
Dataset Splits Yes Table 2: Ablation study on dropping strategy and dropping rate. Reported results are top-1 accuracy on the Image Net-1k validation set, achieved by Dei T-Ti training from scratch on the Image Net-1k training set.
Hardware Specification Yes We measure training speed and memory on NVIDIA RTX A5000 GPUs.
Software Dependencies Yes Our implementation is based on Py Torch 1.12 (Paszke et al., 2019), and we utilize torch.autograd package.
Experiment Setup Yes A.8 MORE EXPERIMENTAL DETAILS We list the detailed key training hyper-parameters, though they are totally the same with the offical implementations: Dei T-Ti, training from scratch, Image Net-1k, w/wo Drop IT2: batch size 1024, Adam W optimizer, learning rate 10 3, weight decay 0.05, cosine LR schedule, 300 epochs, with auto mixed precision (AMP) training;