Phrase-Level Temporal Relationship Mining for Temporal Sentence Localization
Authors: Minghang Zheng, Sizhe Li, Qingchao Chen, Yuxin Peng, Yang Liu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the Activity Net Captions and Charades STA datasets show the effectiveness of our method on both phrase and sentence temporal localization and enable better model interpretability and generalization when dealing with unseen compositions of seen concepts. |
| Researcher Affiliation | Academia | Minghang Zheng1, Sizhe Li1, Qingchao Chen2, Yuxin Peng1,3, Yang Liu1,4* 1Wangxuan Institute of Computer Technology, Peking University, Beijing, China 2National Institute of Health Data Science, Peking University, Beijing, China 3Peng Cheng Laboratory, Shenzhen, China 4 National Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China {minghang, lisizhe, qingchao.chen, pengyuxin, yangliu}@pku.edu.cn |
| Pseudocode | No | The paper includes architectural diagrams and mathematical formulations but does not contain a discrete pseudocode block or algorithm steps labeled as such. |
| Open Source Code | Yes | Code can be found at https://github.com/minghangz/TRM. |
| Open Datasets | Yes | Charades-STA (Gao et al. 2017) originates from Charades (Sigurdsson et al. 2016) dataset... Activity Net Captions (Krishna et al. 2017) contains 20K videos... |
| Dataset Splits | Yes | Charades-STA (Gao et al. 2017) originates from Charades (Sigurdsson et al. 2016) dataset, containing indoor videos with sentence queries and corresponding annotations. There are 12,408 and 3,720 video-query pairs for training and testing respectively. Our sentence-level results are reported on the test split. Activity Net Captions (Krishna et al. 2017) contains 20K videos, with 37,417/17,505/17,031 video-query pairs in the train /val 1/val 2 split. |
| Hardware Specification | No | The paper mentions using VGG and C3D features, but it does not specify any hardware components (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "pre-trained SRLBERT(Shi and Lin 2019)" and "pre-trained Distil BERT (Sanh et al. 2019) model following MMN (Wang et al. 2021b)" and "Adam W (Loshchilov and Hutter 2017) optimizer". However, it does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | We use Adam W (Loshchilov and Hutter 2017) optimizer with learning rate 1 10 4 and batch size 12 for Charades, learning rate 1 10 4 and batch size 20 for Activity Net Captions. The learning rate of Distil BERT is 1/10 of our main model. |