DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Authors: Joya Chen, Kai Xu, Yuhui Wang, Yifei Cheng, Angela Yao
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that we can drop up to 90% of the intermediate tensor elements in fullyconnected and convolutional layers while achieving higher testing accuracy for Visual Transformers and Convolutional Neural Networks on various tasks (e.g. , classification, object detection, instance segmentation). Our code and models are available at https://github.com/chenjoya/dropit. |
| Researcher Affiliation | Academia | Joya Chen1 , Kai Xu1 , Yuhui Wang1, Yifei Cheng2, Angela Yao1 1National University of Singapore 2University of Science and Technology of China |
| Pseudocode | No | The paper describes the method and uses figures for illustration, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and models are available at https://github.com/chenjoya/dropit. |
| Open Datasets | Yes | In this section, we present a comprehensive evaluation of Drop IT s effectiveness, leveraging experiments on training from scratch on Image Net-1k (Russakovsky et al., 2015). Our results demonstrate that Drop IT outperforms existing methods by achieving lower training loss, higher testing accuracy, and reduced GPU memory consumption. We showcase the versatility of Drop IT in various finetuning scenarios, such as Image Net-1k to CIFAR-100 (Krizhevsky et al., 2009), object detection, and instance segmentation on MS-COCO (Lin et al., 2014). |
| Dataset Splits | Yes | Table 2: Ablation study on dropping strategy and dropping rate. Reported results are top-1 accuracy on the Image Net-1k validation set, achieved by Dei T-Ti training from scratch on the Image Net-1k training set. |
| Hardware Specification | Yes | We measure training speed and memory on NVIDIA RTX A5000 GPUs. |
| Software Dependencies | Yes | Our implementation is based on Py Torch 1.12 (Paszke et al., 2019), and we utilize torch.autograd package. |
| Experiment Setup | Yes | A.8 MORE EXPERIMENTAL DETAILS We list the detailed key training hyper-parameters, though they are totally the same with the offical implementations: Dei T-Ti, training from scratch, Image Net-1k, w/wo Drop IT2: batch size 1024, Adam W optimizer, learning rate 10 3, weight decay 0.05, cosine LR schedule, 300 epochs, with auto mixed precision (AMP) training; |