Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval

Authors: Zhihang Liu, Jun Li, Hongtao Xie, Pandeng Li, Jiannan Ge, Sun-Ao Liu, Guoqing Jin

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three widely used benchmarks, including the out-of-distribution settings, show that the proposed framework achieves a new start-of-the-art performance with notable generalization ability (e.g., 4.42% and 7.69% average gains of R1@0.7 on Charades-STA and Charades-CG). The code will be available at https://github.com/lntzm/MESM.
Researcher Affiliation Collaboration 1 University of Science and Technology of China, Hefei, China 2 People s Daily Online
Pseudocode No The paper describes the methods in text and uses figures (Figure 2, Figure 3) to illustrate the pipeline, but it does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes The code will be available at https://github.com/lntzm/MESM.
Open Datasets Yes We evaluate the proposed method on three widely used datasets, which are Charades-STA (Gao et al. 2017), TACo S(Regneri et al. 2013), and QVHighlights (Lei, Berg, and Bansal 2021). We also experiment on Charades-CG (Li et al. 2022a), which proposes out-of-distribution (OOD) settings for Charades-STA.
Dataset Splits Yes We evaluate the proposed method on three widely used datasets, which are Charades-STA (Gao et al. 2017), TACo S(Regneri et al. 2013), and QVHighlights (Lei, Berg, and Bansal 2021). We also experiment on Charades-CG (Li et al. 2022a), which proposes out-of-distribution (OOD) settings for Charades-STA.
Hardware Specification Yes We build our model upon QD-DETR (Moon et al. 2023) with some optimizations, and train our model with Adam optimizer (Kingma and Ba 2014) on a single NVIDIA RTX 3090.
Software Dependencies No The paper mentions building upon 'QD-DETR (Moon et al. 2023)' and using 'Adam optimizer (Kingma and Ba 2014)', but it does not specify version numbers for these or other software libraries (e.g., PyTorch, TensorFlow, Python version) used in the experiments.
Experiment Setup Yes We set γ as 0.9, the hidden dimension of the transformer layers as 256, the layers of FW-MESM, MA, transformer encoder, and decoder as 2.