EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Authors: Sunil Hwang, Jaehong Yoon, Youngwan Lee, Sung Ju Hwang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments. We extensively validate our proposed method on multiple benchmark datasets, including UCF101, HMDB51, K400, Something-Something V2, and Ego4D, and our EVEREST shows remarkable efficiency in terms of memory occupancy, computational cost, and training time compared to strong counterparts, achieving competitive performance. |
| Researcher Affiliation | Collaboration | 1Korea Military Academy 2UNC Chapel Hill 3KAIST 4ETRI 5Deep Auto. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper refers to the Video MAE repository as the base for their implementation but does not explicitly state that the code for EVEREST (their proposed method) is open-source or provide a link to it. |
| Open Datasets | Yes | We extensively validate our proposed method on multiple benchmark datasets, including UCF101 (Soomro et al., 2012), HMDB51 (Kuehne et al., 2011), Something-Something v2 (SSv2) (Goyal et al., 2017), Kinetics-400 (K400) (Kay et al., 2017) and Ego4D (Grauman et al., 2022). |
| Dataset Splits | Yes | The OSCC dataset is the subset of the Ego4d dataset, consisting of 41.1k/21.2k train/val 8-second videos. |
| Hardware Specification | Yes | enabling the pre-training and fine-tuning on a single machine with 8 GPUs; using one node equipped with 8 A100 (80GB) GPUs; a single-node machine equipped with 8 A6000 (48GB) GPUs; 4 NVIDIA RTX 3090 GPUs are used. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer' and 'Cosine decay' but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Table 7: Pre-training settings for K400, SSv2, UCF101, HMDB51 and OSCC. and Table 8: Fine-tuning settings for K400, SSv2, UCF101, HMDB51 and OSCC. These tables specify detailed hyperparameters like optimizer, learning rate, batch size, warmup epochs, masking ratios, and augmentation strategies. |