On the Use of Anchoring for Training Vision Models

Authors: Vivek Sivaraman Narayanaswamy, Kowshik Thopalli, Rushil Anirudh, Yamen Mubarka, Wesam Sakla, Jay Thiagarajan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate our proposed approach across datasets and architectures of varying scales and complexities, demonstrating substantial performance gains in generalization and safety metrics compared to the standard training protocol.
Researcher Affiliation Collaboration Vivek Narayanaswamy Lawrence Livermore National Laboratory narayanaswam1@llnl.gov Kowshik Thopalli Lawrence Livermore National Laboratory thopalli1@llnl.gov Rushil Anirudh Amazon rushil15anirudh@gmail.com Yamen Mubarka Lawrence Livermore National Laboratory mubarka1@llnl.gov Wesam Sakla Lawrence Livermore National Laboratory sakla1@llnl.gov Jayaraman J. Thiagarajan Lawrence Livermore National Laboratory jjthiagarajan@gmail.com
Pseudocode Yes Figure 3: Py Torch style pseudo code for our proposed approach.
Open Source Code Yes The open-source code is available at https://software.llnl.gov/anchoring
Open Datasets Yes CIFAR-10 and (ii) CIFAR-100 [13] datasets contain 50, 000 training samples and 10, 000 test samples each of size 32 × 32 belonging to 10 and 100 classes, respectively; (iii) Image Net-1K [14] is a large-scale vision benchmark comprising 1.3 million training images and 50, 000 validation images across 1000 diverse categories.
Dataset Splits Yes Image Net-1K [14] is a large-scale vision benchmark comprising 1.3 million training images and 50, 000 validation images across 1000 diverse categories.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It mentions using 'high-capacity architectures' but no specific hardware.
Software Dependencies No The paper mentions 'PyTorch style pseudo code' and references 'https://pytorch.org/vision', implying the use of PyTorch, but it does not specify any software components with version numbers (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup Yes Choice of α. Through extensive empirical studies with multiple architectures, we found using the masking schedule hyper-parameter α = 0.2 (corresponds to every 5th batch in an epoch), leads to stable convergence (closely match the top-1 validation accuracy of standard training) on Image Net and α = 0.25 for CIFAR10/100. Note that, our approach performs reference masking for an entire batch as determined by α. We have included our analysis on the impact of choice of α in Section A.1. Table 5 outlines the recipes (augmentations, epochs, optimizers) leveraged for model training.