Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints

Authors: Mengtian Li, Ersin Yumer, Deva Ramanan

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our claim through extensive experiments with state-of-the-art models on Image Net (image classification), Kinetics (video classification), MS COCO (object detection and instance segmentation), and Cityscapes (semantic segmentation).
Researcher Affiliation Collaboration Mengtian Li Carnegie Mellon University mtli@cs.cmu.edu Ersin Yumer Uber ATG meyumer@gmail.com Deva Ramanan CMU & Argo AI deva@cs.cmu.edu
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code No The paper mentions adapting existing open-source codebases for their experiments (e.g., 'We adapt both the network architecture (Res Net-18) and the data loader from the open source Py Torch Image Net example...'), but it does not state that their own developed methodology or contributions are open-source or provide a link to their specific implementation code.
Open Datasets Yes CIFAR-10 (Krizhevsky & Hinton, 2009) is a dataset that contains 70,000 tiny images (32 32). Image Net (Russakovsky et al., 2015) is a widely adopted standard for image classification task. MS COCO (Lin et al., 2014) is a widely recognized benchmark for object detection and instance segmentation. Cityscapes (Cordts et al., 2016) is a dataset commonly used for evaluating semantic segmentation algorithms. Kinetics (Kay et al., 2017) is a large-scale dataset of You Tube videos focusing on human actions.
Dataset Splits Yes We follow the standard setup for dataset split (Huang et al., 2017b), which is randomly holding out 5,000 from the 50,000 training images to form the validation set.
Hardware Specification No The paper mentions the number of GPUs used for training (e.g., 'training using 4 GPUs', 'train with 8 GPUs'), and notes the use of 'asynchronous batch normalization' or 'synchronous batch normalization', but it does not specify the particular models of GPUs (e.g., NVIDIA V100, A100) or any details about CPU or memory specifications.
Software Dependencies Yes We adapt both the network architecture (Res Net-18) and the data loader from the open source Py Torch Image Net example5. Py Torch version 0.4.1. We use an open source codebase6 that has training and data processing code publicly available. Caffe 2 version 0.8.1. We use the open source implementation of Mask R-CNN7, which is a Py Torch re-implementation of the official codebase Detectron in the Caffe 2 framework. Py Torch version 0.4.1.
Experiment Setup Yes We use Res Net-18 (He et al., 2016) as the backbone architecture and utilize SGD with base learning rate 0.1, momentum 0.9, weight decay 0.0005 and a batch size 128. For training, we adopt the 1x schedule (90k iterations)... We train with 8 GPUs (batch size 16) and keep the built-in learning rate warm up mechanism...