HydraViT: Stacking Heads for a Scalable ViT
Authors: Janek Haberer, Ali Hojjat, Olaf Landsiedel
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate the efficacy of Hydra Vi T in achieving a scalable Vi T with up to 10 subnetworks, covering a wide range of resource constraints. Hydra Vi T achieves up to 5 p.p. more accuracy with the same GMACs and up to 7 p.p. more accuracy with the same throughput on Image Net-1K compared to the baselines, making it an effective solution for scenarios where hardware availability is diverse or varies over time. We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224. |
| Researcher Affiliation | Academia | Janek Haberer*, Ali Hojjat*, Olaf Landsiedel Kiel University, Germany *Equal contribution {janek.haberer,ali.hojjat,olaf.landsiedel}@cs.uni-kiel.de |
| Pseudocode | Yes | Algorithm 1: Stochastic dropout training Data: Hydra Vi T: Vθk, Number of batches: Nbatch, Number of the heads of the universal model: H, Uniform distribution: U. for 1 ei Nepoch do for 1 bi Nbatch do \* sample a subnetwork *\ Vθ k U(k) Vθk, k {1, 2, . . . H}; \* calculate single-objective loss *\ L(Vθk(xbi), y); Back-propagation through subnetwork Vθk; end end |
| Open Source Code | Yes | The source code is available at https://github.com/ds-kiel/Hydra Vi T. |
| Open Datasets | Yes | We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224. |
| Dataset Splits | No | The paper uses Image Net-1K but does not explicitly state the training, validation, or test data splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | evaluated on NVIDIA A100 80GB PCIe. |
| Software Dependencies | No | We implement on top of timm (Wightman, 2019) and train according to the procedure of Touvron et al. (2021) but without knowledge distillation. No specific version numbers for software dependencies are provided. |
| Experiment Setup | Yes | For this experiment, we train Hydra Vi T for 300, 400, and 500 epochs with a pre-trained Dei T-tiny checkpoint. We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224. We implement on top of timm (Wightman, 2019) and train according to the procedure of Touvron et al. (2021) but without knowledge distillation. |