Exploiting Invariance in Training Deep Neural Networks

Authors: Chengxi Ye, Xiong Zhou, Tristan McKinney, Yanfeng Liu, Qinggang Zhou, Fedor Zhdanov8849-8856

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Tested on convolutional networks and transformer networks, our proposed technique requires fewer iterations to train, surpasses all baselines by a large margin, seamlessly works on both small and large batch size training, and applies to different computer vision and language tasks.
Researcher Affiliation Collaboration Amazon Web Services chengxye,xiongzho,tristamc,liuyanfe,qingganz@amazon.com, fedor.zhdanov@rhul.ac.uk
Pseudocode Yes Algorithm 1: Proposed Computation for a Convolution Layer
Open Source Code No The paper does not provide a direct link or explicit statement that the code for their proposed method (ND++) is open-source or publicly available.
Open Datasets Yes Image Net (Deng et al. 2009), MS COCO (Lin et al. 2014), and Cityscapes (Cordts et al. 2016) datasets for image classification, object detection, and semantic segmentation, respectively (Fig. 1). In our supplementary materials, we also show promising results for training transformers on multiple vision and language tasks.
Dataset Splits No The paper refers to 'validation accuracy' and 'validation mIoU' and mentions standard training epochs (e.g., '90-epoch training'), but it does not provide explicit dataset split percentages, sample counts for validation, or specific instructions for reproducing the data partitioning.
Hardware Specification Yes On an 8-GPU machine, the synchronization cost is negligible. This implementation allows us to collect reliable statistics throughout the training for all practical batch sizes. In our development, we have consistently achieved satisfactory results using per-GPU batch sizes ranging from from 2 to 1024.
Software Dependencies No The paper mentions 'official PyTorch recipe' and provides 'Matlab code' snippets, but it does not specify version numbers for PyTorch or any other software libraries or dependencies.
Experiment Setup Yes With ND++, we have seamlessly increased the training to batch size 2048, eight times larger than the model zoo default setting of 256. We train in one-eighth the number of iterations but produce superior models (Table 1)... We train for 200 epochs for all experiments unless stated otherwise... We perform fine-tuning experiments with a Res Net-50 backbone pretrained on Image Net an initial learning rates of 0.01 and 0.1 and momentum 0.9 for both ND++ and the Sync BN baseline... set the learning rate of SGD to 0.1 and the weight decay to 0.0001, and use a batch size of 512 when training from scratch.