QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos
Authors: Yogesh Kumar, Saswat Mallick, Anand Mishra, Sowmya Rasipuram, Anutosh Maitra, Roshni Ramnani
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the proposed model significantly outperforms the competitive baselines on two public benchmarks, Vid OR and Image Net-Vid VRD, extended for one-shot open-set localization tasks. |
| Researcher Affiliation | Collaboration | Yogesh Kumar1, Saswat Mallick1, Anand Mishra1, Sowmya Rasipuram2, Anutosh Maitra2, Roshni Ramnani2 1Indian Institute of Technology Jodhpur, India 2Accenture Labs |
| Pseudocode | No | The paper includes mathematical equations and descriptions of the model components but does not provide any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about making the source code available or provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | In this work, we employed three primary datasets: Vid OR (Shang et al. 2019b), Image Net-Vid VRD (Shang et al. 2017b), and Open Images (Kuznetsova et al. 2018). ... The UCF101 (Soomro, Zamir, and Shah 2012) dataset, featuring 13,320 videos with diverse complexities, was employed for pretraining |
| Dataset Splits | Yes | To facilitate our study, we have extended two existing datasets, Vid OR (Shang et al. 2019a) and Image Net-Vid VRD (Shang et al. 2017a), by splitting them into train and test sets... Dataset statistics are provided in Table 2. (Table 2 shows '#Train Videos' and '#Test Videos' counts) |
| Hardware Specification | Yes | We trained the model on three Nvidia-RTX A6000 GPUs. |
| Software Dependencies | No | Our implementation was done using the Py Torch library. (No version number is specified for PyTorch or any other software dependencies). |
| Experiment Setup | Yes | We pre-train and fine-tune our models using the Adam optimizer (Kingma and Ba 2015) with an initial learning rate = 1e-5. ... We train the model for 200 epochs with batch size = 350. |