CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding
Authors: Yaoyuan Liang, Xiao Liang, Yansong Tang, Zhao Yang, Ziran Li, Jingang Wang, Wenbo Ding, Shao-Lun Huang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on the challenging benchmarks of HC-STVG and Vid STG, where Co STA outperforms existing state-of-the-art methods, demonstrating its effectiveness for this task. Experiments Datasets and Metrics |
| Researcher Affiliation | Collaboration | 1Shenzhen Key Laboratory of Ubiquitous Data Enabling, Tsinghua Shenzhen International Graduate School, Tsinghua University 2University of Oxford, 3Meituan Inc. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | We evaluate our proposed method on two mainstream benchmarks HC-STVG (Tang et al. 2021) and Vid STG (Zhang et al. 2020c) |
| Dataset Splits | Yes | HC-STVG dataset ... is divided into training and test subsets with 4,500 and 1,160 video-sentence pairs. This dataset is extended to HC-STVG V2 ..., which contains 10,131 and 3,482 videos in training and validation subsets, respectively. Vid STG dataset ... are divided into training, validation and test subsets with 80,684, 8,956 and 10,303 distinct sentence-tube pairs... |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using RoBERTa (Liu et al. 2019c) but does not provide specific version numbers for it or any other software libraries or frameworks used in the implementation. |
| Experiment Setup | No | The paper mentions a sampling mechanism with ratio β [0, 1] and balancing weights λs for the loss function, and discusses 'faster convergence'. However, it does not provide specific values for hyperparameters such as learning rate, batch size, number of epochs, or detailed optimizer settings. |