Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DINTR: Tracking via Diffusion-based Interpolation
Authors: Pha Nguyen, Ngan Le, Jackson Cothren, Alper Yilmaz, Khoa Luu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experimental Results |
| Researcher Affiliation | Academia | 1University of Arkansas 2Ohio State University 1EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Inplace Reconstruction Finetuning |
| Open Source Code | No | The techniques presented in this work are the intellectual property of [Affiliation], and the organization intends to seek patent coverage for the disclosed process. |
| Open Datasets | Yes | TAP-Vid [18] formalizes the problem of long-term physical Point Tracking. It contains 31,951 points tracked on 1,219 real videos. |
| Dataset Splits | No | The paper mentions fine-tuning on datasets like TAP-Vid and Pose Track, and evaluation on their respective benchmarks. However, it does not explicitly provide details about the specific training/validation/test splits used during their model training (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | The model is trained on 4 NVIDIA Tesla A100 GPUs with a batch size of 1, comprising a pair of frames. |
| Software Dependencies | No | The paper mentions building on 'LDM [13] and ADM [111]' but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The model is then fine-tuned using our proposed strategy for 500 steps with a learning rate of 3e-5. The model is trained on 4 NVIDIA Tesla A100 GPUs with a batch size of 1, comprising a pair of frames. We average the attention AS and AX in the interval k [0, T 0.8] of the DDIM steps with the total timestep T = 50. For the first frame initialization, we employ YOLOX [112] as the detector, HRNet [113] as the pose estimator, and Mask2Former [114] as the segmentation model. We maintained a linear noise scheduler across all experiments... |