CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding

Authors: Yaoyuan Liang, Xiao Liang, Yansong Tang, Zhao Yang, Ziran Li, Jingang Wang, Wenbo Ding, Shao-Lun Huang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the challenging benchmarks of HC-STVG and Vid STG, where Co STA outperforms existing state-of-the-art methods, demonstrating its effectiveness for this task. Experiments Datasets and Metrics
Researcher Affiliation Collaboration 1Shenzhen Key Laboratory of Ubiquitous Data Enabling, Tsinghua Shenzhen International Graduate School, Tsinghua University 2University of Oxford, 3Meituan Inc.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about open-sourcing the code for the described methodology or a link to a code repository.
Open Datasets Yes We evaluate our proposed method on two mainstream benchmarks HC-STVG (Tang et al. 2021) and Vid STG (Zhang et al. 2020c)
Dataset Splits Yes HC-STVG dataset ... is divided into training and test subsets with 4,500 and 1,160 video-sentence pairs. This dataset is extended to HC-STVG V2 ..., which contains 10,131 and 3,482 videos in training and validation subsets, respectively. Vid STG dataset ... are divided into training, validation and test subsets with 80,684, 8,956 and 10,303 distinct sentence-tube pairs...
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using RoBERTa (Liu et al. 2019c) but does not provide specific version numbers for it or any other software libraries or frameworks used in the implementation.
Experiment Setup No The paper mentions a sampling mechanism with ratio β [0, 1] and balancing weights λs for the loss function, and discusses 'faster convergence'. However, it does not provide specific values for hyperparameters such as learning rate, batch size, number of epochs, or detailed optimizer settings.