SpotEM: Efficient Video Search for Episodic Memory

Authors: Santhosh Kumar Ramakrishnan, Ziad Al-Halah, Kristen Grauman

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on 200+ hours of video from the Ego4D EM Natural Language Queries benchmark and three different EM models demonstrate the effectiveness of our approach: computing only 10% 25% of the clip features, we preserve 84% 97% of the original EM model s accuracy.
Researcher Affiliation Collaboration 1UT Austin 2University of Utah 3FAIR, Meta AI. Correspondence to: S. Ramakrishnan <sramakrishnan@utexas.edu>.
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Project page: https:// vision.cs.utexas.edu/projects/spotem
Open Datasets Yes We evaluate our approach on the large-scale EM NLQ benchmark from Ego4D (Grauman et al., 2022), which is the only public dataset supporting this task to our knowledge.
Dataset Splits Yes The dataset contains 11.3k/3.9k/4.0k queries annotated over 136/45/46 hours of train/val/test videos.
Hardware Specification No The paper does not specify the hardware used for experiments (e.g., specific GPU or CPU models).
Software Dependencies No The paper mentions using PyTorch for implementation but does not specify version numbers for any software dependencies.
Experiment Setup Yes We provide the hyperparameters for training Spot EM in Table 3.