Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Memory Consolidation Enables Long-Context Video Understanding
Authors: Ivana Balazevic, Yuge Shi, Pinelopi Papalampidi, Rahma Chaabouni, Skanda Koppula, Olivier J Henaff
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments, We evaluate our method on four challenging datasets for long-context video understanding, namely Diving48, Ego Schema, Next-QA, and Perception Test. |
| Researcher Affiliation | Industry | 1Google DeepMind. Correspondence to: Ivana Balaหzevi c <EMAIL>, Olivier J. H enaff <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Memory-consolidated Vi T., Algorithm 2 Streaming Vi T., Algorithm 3 Memory-augmented Vi T. |
| Open Source Code | No | The paper does not explicitly state that the authors are releasing their code for the described methodology or provide a link to a code repository. |
| Open Datasets | Yes | We evaluate our method on four challenging datasets for long-context video understanding, namely Diving48, Ego Schema, Next-QA, and Perception Test. ... Diving48 (Li et al., 2018) ... Ego Schema (Mangalam et al., 2023) ... Next-QA (Xiao et al., 2021) ... Perception Test (P atr aucean et al., 2023) |
| Dataset Splits | No | The paper refers to fine-tuning and evaluation, and mentions a 'test video' but does not explicitly provide details about a distinct validation set or its specific split for reproducibility. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or TPU versions) used for running its experiments. |
| Software Dependencies | No | The paper mentions general software components like 'BERT-style language encoder' and 'Vi Vi T' and refers to a 'Lo RA' adaptation without providing specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Table 5. Training specifications for fine-tuning MC-Vi T per dataset. Optimizer Adam W Learning rate schedule Cosine with linear warmup Gradient clip 2.0 Linear warmup steps 1k Frame-level resolution 256 256 Batch size 128 256 Label smoothing 0 0.1 # memories/segment (K) 128 512 Frame sampling Uniform 4 FPS Weight decay rate 0 0 1e-2 Base learning rate 2e-5 5e-5 1e-6 Training steps 5k 30k 20k |