Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection
Authors: Kaifeng Gao, Long Chen, Hanwang Zhang, Jun Xiao, Qianru Sun
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our Re Pro achieves a new state-of-the-art performance on two Vid VRD benchmarks of not only the base training object and predicate categories, but also the unseen ones. Extensive ablations also demonstrate the effectiveness of the proposed compositional and multi-mode design of prompts. Code is available at https://github.com/Dawn-LX/Open Voc-Vid VRD. 4 EXPERIMENTS 4.1 DATASETS AND EVALUATION METRICS 4.2 IMPLEMENTATION DETAILS 4.3 EVALUATE OPEN-VOCABULARY OBJECT TRACKLET DETECTION 4.4 EVALUATE OPEN-VOCABULARY RELATION CLASSIFICATION 4.5 ABLATION STUDIES |
| Researcher Affiliation | Academia | Kaifeng Gao1, Long Chen2 , Hanwang Zhang3, Jun Xiao1, Qianru Sun4 1Zhejiang University, 2The Hong Kong University of Science and Technology 3Nanyang Technological University, 4Singapore Management University 1{kite phone,junx}@zju.edu.cn, 2zjuchenlong@gmail.com 3hanwangzhang@ntu.edu.sg, 4qianrusun@smu.edu.sg |
| Pseudocode | No | The paper describes the proposed method in prose and provides figures to illustrate the pipeline (e.g., Figure 3), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Dawn-LX/Open Voc-Vid VRD. |
| Open Datasets | Yes | We evaluated our method on the Vid VRD (Shang et al., 2017) and Vid OR (Shang et al., 2019) benchmarks: |
| Dataset Splits | Yes | Vid VRD consists of 1,000 videos, and covers 35 object categories and 132 predicate categories. We used official splits: 800 videos for training and 200 videos for testing. ... Vid OR consists of 10,000 videos, which covers 80 object categories and 50 predicate categories. We used official splits: 7,000 videos for training, 835 videos for validation, and 2,165 videos for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud computing instance types) used for running the experiments. It only mentions using Adam for training. |
| Software Dependencies | No | The paper mentions using Adam for training and building upon ALPro, Vin VL, and Vid VRD-II. However, it does not specify version numbers for any software dependencies, programming languages, or libraries (e.g., Python version, PyTorch/TensorFlow version). |
| Experiment Setup | Yes | The prompt length L was set as 10. The softmax temperature τ was set as learnable. The GIo U threshold γ was chosen based on the statistics of the training set, by making the tracklet pairs evenly distributed w.r.t different motion patterns. In our implementation, γ was set as -0.3 for Vid VRD and -0.25 for Vid OR. We trained our Re Pro using Adam (Kingma & Ba, 2014) with a learning rate 1e-4, and stopped the training when SGDet m AP drops. |