reproducibilityindex.ai

HydraViT: Stacking Heads for a Scalable ViT

Authors: Janek Haberer, Ali Hojjat, Olaf Landsiedel

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate the efficacy of Hydra Vi T in achieving a scalable Vi T with up to 10 subnetworks, covering a wide range of resource constraints. Hydra Vi T achieves up to 5 p.p. more accuracy with the same GMACs and up to 7 p.p. more accuracy with the same throughput on Image Net-1K compared to the baselines, making it an effective solution for scenarios where hardware availability is diverse or varies over time. We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224.
Researcher Affiliation	Academia	Janek Haberer, Ali Hojjat, Olaf Landsiedel Kiel University, Germany *Equal contribution {janek.haberer,ali.hojjat,olaf.landsiedel}@cs.uni-kiel.de
Pseudocode	Yes	Algorithm 1: Stochastic dropout training Data: Hydra Vi T: Vθk, Number of batches: Nbatch, Number of the heads of the universal model: H, Uniform distribution: U. for 1 ei Nepoch do for 1 bi Nbatch do \* sample a subnetwork \ Vθ k U(k) Vθk, k {1, 2, . . . H}; \ calculate single-objective loss *\ L(Vθk(xbi), y); Back-propagation through subnetwork Vθk; end end
Open Source Code	Yes	The source code is available at https://github.com/ds-kiel/Hydra Vi T.
Open Datasets	Yes	We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224.
Dataset Splits	No	The paper uses Image Net-1K but does not explicitly state the training, validation, or test data splits (e.g., percentages or sample counts).
Hardware Specification	Yes	evaluated on NVIDIA A100 80GB PCIe.
Software Dependencies	No	We implement on top of timm (Wightman, 2019) and train according to the procedure of Touvron et al. (2021) but without knowledge distillation. No specific version numbers for software dependencies are provided.
Experiment Setup	Yes	For this experiment, we train Hydra Vi T for 300, 400, and 500 epochs with a pre-trained Dei T-tiny checkpoint. We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224. We implement on top of timm (Wightman, 2019) and train according to the procedure of Touvron et al. (2021) but without knowledge distillation.