Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
TokenLearner: Adaptive Space-Time Tokenization for Videos
Authors: Michael Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani, Anelia Angelova
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate strong performance on several challenging benchmarks for video recognition tasks. We establish new state-of-the-arts on multiple video datasets, including Kinetics-400, Kinetics-600, Charades, and AVi D. |
| Researcher Affiliation | Collaboration | Michael S. Ryoo1,2, AJ Piergiovanni1, Anurag Arnab1, Mostafa Dehghani1, Anelia Angelova1 1Google Research 2Stony Brook University EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code will be available at: https://github.com/google-research/ scenic/tree/main/scenic/projects/token_learner |
| Open Datasets | Yes | We use the Kinetics datasets... We train and evaluate on both Kinetics-400 and Kinetics-600 datasets... We follow the standard settings used in previous papers and report accuracy on the validation set [5, 12].Charades dataset [31]AVi D dataset [27] |
| Dataset Splits | Yes | We train and evaluate on both Kinetics-400 and Kinetics-600 datasets... We follow the standard settings used in previous papers and report accuracy on the validation set [5, 12]. |
| Hardware Specification | No | The paper mentions FLOPs and GFLOPS for computational cost but does not specify the CPU, GPU, or other hardware used for running experiments. |
| Software Dependencies | No | The paper mentions using the Scenic library (built on JAX) but does not provide specific version numbers for JAX or other critical software dependencies. |
| Experiment Setup | Yes | Following the setting of [2], we used the input resolution of 224x224, extracting tubelets, and attaching positional encodings. We tried various number of tokens including S = 8, 16, 32, and use S = 8 and 16 as our default settings.We use 224 224 64 videos for training and 256 256 64 videos for testing.S = 8 number of tokens were used. |