Beyond Accuracy: Tracking more like Human via Visual Search
Authors: Dailing Zhang, Shiyu Hu, Xiaokun Feng, Xuchen Li, wu meiqi, Jing Zhang, Kaiqi Huang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate that the proposed CPDTrack not only achieves state-of-the-art (SOTA) performance in this challenge but also narrows the behavioral differences with humans. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2Institute of Automation, Chinese Academy of Sciences 3School of Computer Science and Technology, University of Chinese Academy of Sciences 4Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences 5School of Physical and Mathematical Sciences, Nanyang Technological University |
| Pseudocode | No | The paper describes the proposed method but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code and models are available at https://github.com/Zhang Dailing8/CPDTrack. |
| Open Datasets | Yes | Our training data includes the training splits of Video Cube[], La SOT[9], GOT-10k[19], and Tracking Net[21]. |
| Dataset Splits | No | The paper mentions using training splits of various datasets (Video Cube, La SOT, GOT-10k, Tracking Net) but does not explicitly describe how these are further split into training, validation, and test sets for their own experiments, or refer to predefined validation splits with specific details like percentages or counts for hyperparameter tuning. |
| Hardware Specification | Yes | The model is trained on a server with four A5000 GPUs and is tested on an A5000 GPU. |
| Software Dependencies | No | The paper mentions the AdamW optimizer, ViT-B encoder, and MAE pre-trained parameters, but does not specify specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train the model with Adam W[66] optimizer and set the learning rate of the encoder to 1e 5, the decoder and remaining modules to 1e 4, and the weight decay to 1e 4. The model is trained for a total of 300 epochs with 60k image pairs per epoch. The learning rate decreases by a factor of 10 after 240 epochs. |