reproducibilityindex.ai

TokenLearner: Adaptive Space-Time Tokenization for Videos

Authors: Michael Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate strong performance on several challenging benchmarks for video recognition tasks. We establish new state-of-the-arts on multiple video datasets, including Kinetics-400, Kinetics-600, Charades, and AVi D.
Researcher Affiliation	Collaboration	Michael S. Ryoo1,2, AJ Piergiovanni1, Anurag Arnab1, Mostafa Dehghani1, Anelia Angelova1 1Google Research 2Stony Brook University {mryoo,ajpiergi,aarnab,dehghani,anelia}@google.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code will be available at: https://github.com/google-research/ scenic/tree/main/scenic/projects/token_learner
Open Datasets	Yes	We use the Kinetics datasets... We train and evaluate on both Kinetics-400 and Kinetics-600 datasets... We follow the standard settings used in previous papers and report accuracy on the validation set [5, 12].Charades dataset [31]AVi D dataset [27]
Dataset Splits	Yes	We train and evaluate on both Kinetics-400 and Kinetics-600 datasets... We follow the standard settings used in previous papers and report accuracy on the validation set [5, 12].
Hardware Specification	No	The paper mentions FLOPs and GFLOPS for computational cost but does not specify the CPU, GPU, or other hardware used for running experiments.
Software Dependencies	No	The paper mentions using the Scenic library (built on JAX) but does not provide specific version numbers for JAX or other critical software dependencies.
Experiment Setup	Yes	Following the setting of [2], we used the input resolution of 224x224, extracting tubelets, and attaching positional encodings. We tried various number of tokens including S = 8, 16, 32, and use S = 8 and 16 as our default settings.We use 224 224 64 videos for training and 256 256 64 videos for testing.S = 8 number of tokens were used.