Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
HydraViT: Stacking Heads for a Scalable ViT
Authors: Janek Haberer, Ali Hojjat, Olaf Landsiedel
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate the efficacy of Hydra Vi T in achieving a scalable Vi T with up to 10 subnetworks, covering a wide range of resource constraints. Hydra Vi T achieves up to 5 p.p. more accuracy with the same GMACs and up to 7 p.p. more accuracy with the same throughput on Image Net-1K compared to the baselines, making it an effective solution for scenarios where hardware availability is diverse or varies over time. We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224. |
| Researcher Affiliation | Academia | Janek Haberer*, Ali Hojjat*, Olaf Landsiedel Kiel University, Germany *Equal contribution EMAIL |
| Pseudocode | Yes | Algorithm 1: Stochastic dropout training Data: Hydra Vi T: Vθk, Number of batches: Nbatch, Number of the heads of the universal model: H, Uniform distribution: U. for 1 ei Nepoch do for 1 bi Nbatch do \* sample a subnetwork *\ Vθ k U(k) Vθk, k {1, 2, . . . H}; \* calculate single-objective loss *\ L(Vθk(xbi), y); Back-propagation through subnetwork Vθk; end end |
| Open Source Code | Yes | The source code is available at https://github.com/ds-kiel/Hydra Vi T. |
| Open Datasets | Yes | We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224. |
| Dataset Splits | No | The paper uses Image Net-1K but does not explicitly state the training, validation, or test data splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | evaluated on NVIDIA A100 80GB PCIe. |
| Software Dependencies | No | We implement on top of timm (Wightman, 2019) and train according to the procedure of Touvron et al. (2021) but without knowledge distillation. No specific version numbers for software dependencies are provided. |
| Experiment Setup | Yes | For this experiment, we train Hydra Vi T for 300, 400, and 500 epochs with a pre-trained Dei T-tiny checkpoint. We assess all experiments and baselines on Image Net-1K (Deng et al., 2009) at a resolution of 224 224. We implement on top of timm (Wightman, 2019) and train according to the procedure of Touvron et al. (2021) but without knowledge distillation. |