TrackIME: Enhanced Video Point Tracking via Instance Motion Estimation

Authors: Seong Hyeon Park, Huiwon Jang, Byungwoo Jeon, Sukmin Yun, Paul Hongsuck Seo, Jinwoo Shin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For example, on the recent TAP-Vid benchmark, our framework consistently improves all baselines, e.g., up to 13.5% improvement on the average Jaccard metric.
Researcher Affiliation Academia 1KAIST 2Hanyang University ERICA 3Korea University {seonghyp, huiwoen0516, imbw2024, jinwoos}@kaist.ac.kr sukminyun@hanyang.ac.kr phseo@korea.ac.kr
Pseudocode No The paper provides a workflow diagram (Figure 1) and mathematical formulations, but no explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The open-source version of Track IME is available at https://github.com/kami93/trackime.
Open Datasets Yes We evaluate these models on three different datasets, DAVIS [12], Kinetics [34], and RGBStacking [33], each representing different characteristics.
Dataset Splits No The paper mentions using 'validation' and 'test-dev' sets for the zero-shot benchmark ('In particular, we use the validation and the test-dev sets for the zero-shot benchmark.') but does not specify the splits (e.g., percentages, counts, or explicit splitting methodology) for training, validation, and test datasets needed for reproduction. It refers to established datasets like DAVIS, which have predefined splits, but doesn't explicitly state their use of those splits or how their data was partitioned if custom.
Hardware Specification Yes Every baseline model and internal module in Track IME (e.g., Segment Anything [1]) is implemented in Py Torch 2.1 [32] compiled for CUDA 11.8, which we run on an NVIDIA RTX 3090 GPU.
Software Dependencies Yes Every baseline model and internal module in Track IME (e.g., Segment Anything [1]) is implemented in Py Torch 2.1 [32] compiled for CUDA 11.8, which we run on an NVIDIA RTX 3090 GPU.
Experiment Setup Yes For example, we choose the hyperparameters for each baseline, e.g., progressive inference steps K = 2, and the pruning sizes H0 = W0 = 960 and H1 = W1 = 384 when incorporated with TAPIR [6].