Weakly-Supervised Video Moment Retrieval via Semantic Completion Network
Authors: Zhijie Lin, Zhou Zhao, Zhu Zhang, Qi Wang, Huasheng Liu11539-11546
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the Activity Captions and Charades-STA demonstrate the effectiveness of our proposed method. |
| Researcher Affiliation | Collaboration | Zhijie Lin,1 Zhou Zhao,1 Zhu Zhang,1 Qi Wang,2 Huasheng Liu2 1College of Computer Science, Zhejiang University, Hangzhou, China, 2Alibaba Inc., China |
| Pseudocode | No | The paper describes algorithms in text and diagrams but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any link or statement about open-sourcing its code. |
| Open Datasets | Yes | Experiments on the Activity Captions and Charades-STA demonstrate the effectiveness of our proposed method. ... Activity Captions. The Activity Captions (Caba Heilbron et al. 2015) dataset ... Charades-STA. The Charades-STA dataset is released in (Gao et al. 2017) for moment retrieval... |
| Dataset Splits | Yes | The released Activity Captions dataset comprise 17,031 description-moment pairs for training. Since the caption annotations of test data of Activity Captions are not publically available, we take the val 1 as the validation set and val 2 as test data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions software components like 'word2vec', 'NLTK', 'pretrained Glove', 'Adam optimizer', and 'Transformer' but does not specify their version numbers. |
| Experiment Setup | Yes | Model Settings. At each time step of video, we score nk candidate proposals of multiple scales. We set nk to 6 with ratios of [0.167, 0.333, 0.500, 0.667, 0.834, 1.0] for Activity Captions, and to 4 with ratios of [0.167, 0.250, 0.333, 0.500] for Charades-STA. We then set the decay hyperparameter λ1 to 0.5, λ2 to 2000, the number of selected proposals K to 4, the balance hyper-parameter β to 0.1. Also, we mask one-third of words in a sentence and replace with a special token for semantic completion. Note that noun and verb are more likely to be masked. Moreover, for Transformer Encoder as well as Transformer Decoder, the dimension of hidden state is set to 256 and the number of layers is set to 3. During training, we adopt the Adam optimizer with learning rate 0.0002 to minimize the multi-task loss. The learning rate increases linearly to the maximum with a warm-up step of 400 and then decreases itself based on the number of updates (Vaswani et al. 2017). |