Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Authors: Zhihang Liu, Jun Li, Hongtao Xie, Pandeng Li, Jiannan Ge, Sun-Ao Liu, Guoqing Jin
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on three widely used benchmarks, including the out-of-distribution settings, show that the proposed framework achieves a new start-of-the-art performance with notable generalization ability (e.g., 4.42% and 7.69% average gains of R1@0.7 on Charades-STA and Charades-CG). The code will be available at https://github.com/lntzm/MESM. |
| Researcher Affiliation | Collaboration | 1 University of Science and Technology of China, Hefei, China 2 People s Daily Online |
| Pseudocode | No | The paper describes the methods in text and uses figures (Figure 2, Figure 3) to illustrate the pipeline, but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code will be available at https://github.com/lntzm/MESM. |
| Open Datasets | Yes | We evaluate the proposed method on three widely used datasets, which are Charades-STA (Gao et al. 2017), TACo S(Regneri et al. 2013), and QVHighlights (Lei, Berg, and Bansal 2021). We also experiment on Charades-CG (Li et al. 2022a), which proposes out-of-distribution (OOD) settings for Charades-STA. |
| Dataset Splits | Yes | We evaluate the proposed method on three widely used datasets, which are Charades-STA (Gao et al. 2017), TACo S(Regneri et al. 2013), and QVHighlights (Lei, Berg, and Bansal 2021). We also experiment on Charades-CG (Li et al. 2022a), which proposes out-of-distribution (OOD) settings for Charades-STA. |
| Hardware Specification | Yes | We build our model upon QD-DETR (Moon et al. 2023) with some optimizations, and train our model with Adam optimizer (Kingma and Ba 2014) on a single NVIDIA RTX 3090. |
| Software Dependencies | No | The paper mentions building upon 'QD-DETR (Moon et al. 2023)' and using 'Adam optimizer (Kingma and Ba 2014)', but it does not specify version numbers for these or other software libraries (e.g., PyTorch, TensorFlow, Python version) used in the experiments. |
| Experiment Setup | Yes | We set γ as 0.9, the hidden dimension of the transformer layers as 256, the layers of FW-MESM, MA, transformer encoder, and decoder as 2. |