HydraViT: Stacking Heads for a Scalable ViT

Authors: Janek Haberer, Ali Hojjat, Olaf Landsiedel

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate the efficacy of Hydra Vi T in achieving a scalable Vi T with up to 10 subnetworks, covering a wide range of resource constraints. Hydra Vi T achieves up to 5 p.p. more accuracy with the same GMACs and up to 7 p.p. more accuracy with the same throughput on Image Net-1K compared to the baselines, making it an effective solution for scenarios where hardware availability is diverse or varies over time. We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224.
Researcher Affiliation Academia Janek Haberer*, Ali Hojjat*, Olaf Landsiedel Kiel University, Germany *Equal contribution {janek.haberer,ali.hojjat,olaf.landsiedel}@cs.uni-kiel.de
Pseudocode Yes Algorithm 1: Stochastic dropout training Data: Hydra Vi T: Vθk, Number of batches: Nbatch, Number of the heads of the universal model: H, Uniform distribution: U. for 1 ei Nepoch do for 1 bi Nbatch do \* sample a subnetwork *\ Vθ k U(k) Vθk, k {1, 2, . . . H}; \* calculate single-objective loss *\ L(Vθk(xbi), y); Back-propagation through subnetwork Vθk; end end
Open Source Code Yes The source code is available at https://github.com/ds-kiel/Hydra Vi T.
Open Datasets Yes We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224.
Dataset Splits No The paper uses Image Net-1K but does not explicitly state the training, validation, or test data splits (e.g., percentages or sample counts).
Hardware Specification Yes evaluated on NVIDIA A100 80GB PCIe.
Software Dependencies No We implement on top of timm (Wightman, 2019) and train according to the procedure of Touvron et al. (2021) but without knowledge distillation. No specific version numbers for software dependencies are provided.
Experiment Setup Yes For this experiment, we train Hydra Vi T for 300, 400, and 500 epochs with a pre-trained Dei T-tiny checkpoint. We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224. We implement on top of timm (Wightman, 2019) and train according to the procedure of Touvron et al. (2021) but without knowledge distillation.