Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DINTR: Tracking via Diffusion-based Interpolation

Authors: Pha Nguyen, Ngan Le, Jackson Cothren, Alper Yilmaz, Khoa Luu

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experimental Results
Researcher Affiliation Academia 1University of Arkansas 2Ohio State University 1EMAIL EMAIL
Pseudocode Yes Algorithm 1 Inplace Reconstruction Finetuning
Open Source Code No The techniques presented in this work are the intellectual property of [Affiliation], and the organization intends to seek patent coverage for the disclosed process.
Open Datasets Yes TAP-Vid [18] formalizes the problem of long-term physical Point Tracking. It contains 31,951 points tracked on 1,219 real videos.
Dataset Splits No The paper mentions fine-tuning on datasets like TAP-Vid and Pose Track, and evaluation on their respective benchmarks. However, it does not explicitly provide details about the specific training/validation/test splits used during their model training (e.g., percentages or sample counts for each split).
Hardware Specification Yes The model is trained on 4 NVIDIA Tesla A100 GPUs with a batch size of 1, comprising a pair of frames.
Software Dependencies No The paper mentions building on 'LDM [13] and ADM [111]' but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes The model is then fine-tuned using our proposed strategy for 500 steps with a learning rate of 3e-5. The model is trained on 4 NVIDIA Tesla A100 GPUs with a batch size of 1, comprising a pair of frames. We average the attention AS and AX in the interval k [0, T 0.8] of the DDIM steps with the total timestep T = 50. For the first frame initialization, we employ YOLOX [112] as the detector, HRNet [113] as the pose estimator, and Mask2Former [114] as the segmentation model. We maintained a linear noise scheduler across all experiments...