reproducibilityindex.ai

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Authors: Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Hiera on a variety of tasks for image and video recognition.
Researcher Affiliation	Collaboration	1Meta AI, FAIR 2Georgia Tech 3Johns Hopkins University.
Pseudocode	No	The paper includes architectural diagrams (e.g., Figures 2, 4, 5, 6) and descriptions of processes, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and models are available at https://github.com/facebookresearch/hiera.
Open Datasets	Yes	We ablate using our large model, Hiera-L, to ensure that our method works at scale. We evaluate performance by finetuning. All metrics are top-1 accuracies using standard evaluation protocols a single (resized) center crop on IN1K and 3 spatial 5 temporal views on K400. Image Net-1K (IN1K, Deng et al. (2009)) and Kinetics-400 (K400, Kay et al. (2017)).
Dataset Splits	Yes	We evaluate performance by finetuning. All metrics are top-1 accuracies using standard evaluation protocols a single (resized) center crop on IN1K and 3 spatial 5 temporal views on K400. For each ablation, we use 400 (800) epochs of sparse MAE pretraining for IN1K (K400) and 50 epochs of dense finetuning unless otherwise noted.
Hardware Specification	Yes	All benchmarks in this paper are on an A100 with fp16 (as this setting is most useful in practice) unless noted otherwise. We use an NVIDIA A100 40GB GPU, PyTorch v1.12.1 and CUDA 11.4 to benchmark speed for all baselines and our approach, unless otherwise mentioned.
Software Dependencies	Yes	We use an NVIDIA A100 40GB GPU, PyTorch v1.12.1 and CUDA 11.4 to benchmark speed for all baselines and our approach, unless otherwise mentioned.
Experiment Setup	Yes	Table 11: Settings for Kinetics-400, -600, -700. (a) Pretraining (e.g., 'optimizer Adam W', 'learning rate 8e-4', 'warmup epochs 120', 'epochs 800 / 1600 / 3200', 'batch size 512', 'mask ratio 0.9', 'drop path 0.1'). (b) Finetuning. Similar detailed settings are provided in Tables 12, 13, 14, and 15 for SSv2, AVA, ImageNet-1K, and COCO respectively.