DINTR: Tracking via Diffusion-based Interpolation
Authors: Pha Nguyen, Ngan Le, Jackson Cothren, Alper Yilmaz, Khoa Luu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Results |
| Researcher Affiliation | Academia | 1University of Arkansas 2Ohio State University 1{panguyen, thile, jcothre, khoaluu}@uark.edu 2yilmaz.15@osu.edu |
| Pseudocode | Yes | Algorithm 1 Inplace Reconstruction Finetuning |
| Open Source Code | No | The techniques presented in this work are the intellectual property of [Affiliation], and the organization intends to seek patent coverage for the disclosed process. |
| Open Datasets | Yes | TAP-Vid [18] formalizes the problem of long-term physical Point Tracking. It contains 31,951 points tracked on 1,219 real videos. |
| Dataset Splits | No | The paper mentions fine-tuning on datasets like TAP-Vid and Pose Track, and evaluation on their respective benchmarks. However, it does not explicitly provide details about the specific training/validation/test splits used during their model training (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | The model is trained on 4 NVIDIA Tesla A100 GPUs with a batch size of 1, comprising a pair of frames. |
| Software Dependencies | No | The paper mentions building on 'LDM [13] and ADM [111]' but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The model is then fine-tuned using our proposed strategy for 500 steps with a learning rate of 3e-5. The model is trained on 4 NVIDIA Tesla A100 GPUs with a batch size of 1, comprising a pair of frames. We average the attention AS and AX in the interval k [0, T 0.8] of the DDIM steps with the total timestep T = 50. For the first frame initialization, we employ YOLOX [112] as the detector, HRNet [113] as the pose estimator, and Mask2Former [114] as the segmentation model. We maintained a linear noise scheduler across all experiments... |